Hybrid ART/Kohonen Neural Model for Document Image Compression. Computer Science Department NMT Socorro, New Mexico {hss,

Hybrid ART/Kohonen Neural Model for Document Image Compression Hamdy S. Soliman Mohammed Omari Computer Science Department NMT Socorro, New Mexico 87801 {hss, omari01}@nmt.edu Keyword -1: Learning Algorithms and Training. Keyword -2: Adaptive Learning, Algorithms Network Resonance, ART, Applications Neura Networks, Comparisons of Networks, Data Compression, Experimental Design, Hybrid Artificial Neural, Image Compression, Kohonen Networks, Minimum Distance Classifier, Neural Network Algorithms/Applications/Architecture, Neural Network Training, Self-Adaptive Training, Self Organizing Neural Networks, Signal to Noise Ratio, Testing Importance, Training Vector Quantization, Unsupervised Classification/Learning, Vector Quantization, Wavelets, Winner-Take-All. Abstract A new neural direct classification model, DC, is introduced for image data compression. It is based on the Adaptive Resonance Theorem (ART1) and the Kohonen Self Organizing Feature Map (KSOFM) neural models. The input subimage domain is clustered into classes of similar subimages. Each class will have a center of mass subimage representative, called centroid. The main goal of the DC training/compression phase is to develop a table of all class' centroids (codebook) and map each input subimage into its class centroid index in the codebook. The process of deciding the membership of an input subimage to some hosting class is similar to the ART1 mechanism. However, the updating of the host class centroid is related to the KSOFM

approach. An input image is compressed into a set of centroid' indices, one per subimage, and a local codebook. Compression is realized because the bit size of the input subimage is an order of magnitude larger than its encoding centroid index. The DC model has achieved much better results than the state-of-the-art peer document compression techniques, e.g., JPEG, DjVu, Wavelet, etc. I. Introduction Uncompressed multimedia (graphics, audio and video) data require considerable storage capacity and transmission bandwidth. Thus, image compression is a very important factor for better bandwidth utilization over a communication network. The compression process is based on redundancy and irrelevancy reduction, which are associated with the image domain. Redundancy reduction aims at removing duplication from the signal source (image/video), whereas irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver, namely the Human Visual System (HVS). There are two well-known major approaches to the implementation of image compression: discrete cosine transform technique (e.g., Wavelet. JPEG, DjVu, ), and neural networks models (e.g., counterpropagation, Kohonen, backpropagation, adaptive resonance theory, ). In the discrete cosine transform approach, compression is accomplished by applying a linear transform to de-correlate the image data (source encoder), quantizing the resulting transform coefficients (quantizer), and entropy-coding the quantized values (entropy encoder). In the neural network approach, a set of input subimage vectors is presented to the net in order to train its neuron's set of weight vectors (weight matrix) via a process of weight adjustment. The

neural network approach encompassing the use of the counterpropagation network (CPN) and the Kohonen feature map (KSOFM) models to implement data compression was first suggested by Kohonen and Hecht-Nielson [Kohonen 1982; Kohonen 1984; Hecht-Nielson 1987; Hecht- Nielson 1988]. We introduced a new neural model of direct classification (DC) that is based on the philosophy of ART (adaptive resonance theory) and the KSOFM. Our DC model combines the advantages of these two neural net models in order to achieve high (decompressed) image quality at high compression ratio. The DC model performed exceptionally well in the image document domain, compared to the peer "state-of-art" models (e.g., Wavelet and DjVu). In section II, we briefly introduce to the above neural models. Our DC model is defined in Section III. In Section IV, some experiments, using the traditional peer models versus our DC model, and a discussion of the significant of the obtained results will be covered. Section V will contain our conclusion and future work. II. Utilized Neural Networks Models: The AVQ theory is one of the most recent techniques used in the domain of image compression [1][2][3][7][11][12]. In the AVQ image compression-based approach, the input image is divided into equal size sections (subimages), where each section of N 2 pixels is considered as a vector in the encoding space R HxW. The neural net clusters the subimages into classes of similar subimages. Each class has a representative vector, the centroid, that represents

any of its member subimages. All centroid representatives for an image are to be tabled in a lookup table or codebook. Then, each subimage is compressed into the index of its corresponding class representative in the codebook. Thus, the compressed image is a set of representative indices that represent (in order) its original subimages. The compression is realized because the size of the original subimage (N 2 ) is an order of magnitude larger than its corresponding centroid index (log 2 codebook ). The Kohonen self-organizing feature map (KSOFM) is a structured network with a twodimensional array of nodes in which an adaptive partner weight vector is associated with every node [1][2][3]. Based on the AVQ theory, the KSOFM initializes the weight matrix randomly. The input subimage vectors are presented sequentially to the network [9][10]. During the training process, the nearest weight vector (winner) to the input vector is adaptively moved closer to such input vector. Consequently, with every introduction of an input subimage vector, the winning vector moves toward some theoretical class's center of mass (centroid) in its way to become the true representative of all subimages that have won and so similar. The input vector is associated to the class of its corresponding representative centroid. The input subimage vectors are presented in epochs until no change of memberships is observed. The adaptive resonance theory 1 (ART1) was developed by Carpenter and Grossberg (1987, 1988) [4][5][6]. It serves the purpose of cluster discovery. Like networks using a single Kohonen layer with competitive learning neurons, this network learns clusters in an unsupervised mode. The novel property of the ART1 network is the controlled discovery of clusters. In addition, the ART1 network can accommodate new clusters without affecting the storage or recall capabilities for clusters already learned (elasticity/plasticity property).

Essentially, the network follows the leader after it originates the first cluster with the first input pattern received. It then creates the second cluster if the distance of the second pattern exceeds a certain threshold; otherwise the pattern is clustered with the first cluster. These features of elasticity and single (one epoch) training cycle truly characterize the ART model. III. Direct Classification Model (DC): Based on AVQ theory, we designed our new Direct Classification neural net engine for image compression/decompression. It follows the winner-take-all feature of the Kohonen model, and the elasticity along with the single epoch training cycle features of the ART1 model. The advantage of the DC over Kohonen is that the input domain is presented only once to the DC system. Therefore the asymptotic training time complexity is O(n), where n is the size of the input domain of subimages; a huge reduction to the original KSOFM time complexity of n 2.The traditional ART1 generates small real numbers in the weight matrix, which might cause the deviation of the classification, in case of low system precision. Instead, our DC model develops classes' representatives (centers of mass) called centroids, which are of the same type of the input subimages (integer vectors). Moreover, both of the Kohonen and ART1 models use training input sets of much smaller size than the total input domain. Therefore, the codebook will be only a good representative of the input training subimages, not the total subimages domain. Such limitation does not exist in our DC model. Our work is in the same area as the well known Wavelet and JPEG image compression techniques, with a major difference in the formation of the lookup tables. The manufacturing of

the DC's lookup tables is carried out via the training of a hybrid neural model of the SOFM and ART nets, with some modifications. Our model is a vector quantizer (VQ), which encodes subimage vectors via the mapping of many similar k-dimensional input vectors (with respect to a given distortion measure) into one representative codeword (centroid) vector. The similarity measure of vectors X and Y is based on the distance X-Y (distortion) between X and Y. A collection of codewords is to be stored in a lookup table (codebook), which is utilized later at the decompression phase, in the lookup process. The manufacturing and the nature of the codebook are the key distinctions of our work from other peer mechanisms. The training subphase (Fig. 1) starts by dividing the total subimages vector domain into classes (clusters) of similar subimages. Each class has a centroid that represents the center of mass (average) of all of its member subimages, and its index will be used to replace each of them in the compressed image file. As shown in Fig.2, the mechanism incrementally constructs a codebook of centroids to be placed in the DC neurons synaptic weight vectors. Starting with a set of empty synaptic weights (empty code book), a very simplified ART approach is adopted to build up the codebook table. For each input subimage S, we adaptively train the DC synaptic weights based on the SOFM "winner takes all" concept. The first step in the process of adjustment is to find the closest centroids to S, in the active entries of the codebook, and to obtain a set of possible winning centroids, PWCS. This is done based on an acceptable number of pixel differences (within MaxIntDiff threshold) between S and the codebook entries. Failing to find a PWS (i.e., empty PWS) forces the system to add a new cluster with S as its centroid, given that there is an available entry in the codebook. If not, the system will consider the whole codebook as S's PWCS. Then, among the obtained PWCS, a second

process is initiated to find the final wining centroid, WC, for S. WC should have the smallest mean error (ME) with S, among all members of the PWCS. In order to maintain the adaptation feature of the KSOFM model, S is used to adjust WC in order to represent the average of its class members, including S. Another DC variation from the Kohonen model is that only a subset of the class members will be involved in the adjustment of the centroid. The DC philosophy is that only the first k-classified subimages adjust the centroid. Such approach helps in the stability of the centroid ability of accurate representation of the class core, based in the spatial locality. After adjusting WC with the k-spatially consecutive input subimages, the k+1 th input subimage member will have two possibilities. First, it is very similar to the centroid. Therefore not involving it in the centroid updating will not hurt the process. On the other hand, if it is not similar, we do not want it to affect the true centroid representation of the previously classified k- subimages. Thus, the process of adjusting WC is performed only when its class size is still below a certain threshold (MaxTrainSet). The compression subphase only assigns the WC index in the codebook table as the S s compression index (CI). The indices will be stored, in the same sequence as the introduction of the training subimages. The result of the previous operations (subphases) is a codebook table of a fully trained neuron synaptic weight vectors (centroids), and an indices sequence that represents the compression of the original subimages of the input image file, with each subimage denoted by its corresponding centroid representative. The codebook is to be appended to the indices sequence, forming the final compressed file. The compression is achieved due to the huge reduction (order of magnitude) of the subimage representation.

In the decompression phase (Fig. 3), the indices sequence in the compressed file is scanned sequentially. For each index value i, at location j, a lookup process is performed in the codebook to obtain the i th centroid that represents the j th original subimage in the decompressed file. In order to achieve a better compression ratio, the lossless LZW is performed on the compressed file before storing (or transmitting) it. Then, at the decompression phase, LZW decompression is performed on the indices file (Fig. 4). VI. Results and discussion We experimented with wide varieties of different images. We drew intensity graphs and pixel value distributions in order to find a balance of the training parameters that achieve good performance (L: codebook size, MaxIntDif: maximum acceptable byte by byte intensity difference between a subimage and a selected centroid, MaxTrainSet: the maximum number of inputs allowed to adjust a specific centroid). A simple conclusion seems very difficult to achieve after a considerable number of experiments using our DC model. The lack of common characteristics among images, even of the same domain, led us to try different parameter values for each image. Although DjVu s developers claim that it treats the background of the image (especially documents) separately [8], experiments showed that DC is better at background recognition (Fig. 6). The DC recognizes the background as the centroids representatives of the highest prominent subimages. Placing such centroids near the start of the codebook (index 0, 1, 2, ) will guarantee many zeros in their binary representation. This will lead the indices files to have many runs of zeros since those centroids indices are the most repeated inside the compressed file.

Next table shows the cases, where good results were obtained. Thus, using LZW will result in significant improvement in compression ratio. In Table 1, DC achieves better compression ratio than DjVu. DjVu failed to surpass the DC model in the domain of non-purely-textual documents (mostly graphs and figures). In some examples, as in cdoc12.ppm (Fig. 7), the compression ratio was nearly three times that of the DjVu, with a lossless restoration of the original file. Such excellent results are achieved since the number of different subimage patterns inside the original image is small. The ability to tune the quality and the compression ratio is maintained in the DC model. Such flexibility is achievable due to the private generation of the codebook for each image. Such a feature is not amenable to the universal codebook that is generated for multiple images [1]. V. Conclusion and Future Work The DC hybrid approach of the two prominent neural models, Kohonen (KSOFM) and ART, led to a very useful and practical image compression model. It combines the best features of both, fast training (ART) and reliable/accurate centroid formation (KSOFM). Moreover, our DC model adds its own special features (e.g., introduction of different thresholds to control the centroid development), which allowed it to compete with other well-known peer techniques (e.g., Wavelet, DjVue, JPEG) in the document compression domain. Our future work will involve the expansion of the DC model into the other image domains (e.g., pictorial, satellite, landscape, etc).

References [1] H.S. Soliman, A. Abdelali. Toward lossless image compression. Proceedings of the ISCA 8 th International Conference, Denver, CO June 24-26, 1999. [2] Tuevo Kohonen. Self-organizing maps, optimization approaches. Artificial Neural Networks, p 981-990, Elsevier Science Publishers B.V. (North-Holland) 1991. [3] Willie Chang. Neural Networks as a vector quantization image coding model. NMT PHD dissertation, 1993 [4] Jack M.Zurada. Introduction to Artificial Neural Systems. PWS Publishing Company, 1995. [5] Naser M. Nasrabadi, Aggelos K.Katsaggelos. Applications of Artificial Neural Network in Image Processing III. San Jose, California, April 1998. [6] Naser M. Nasrabadi, Aggelos K.Katsaggelos. Applications of Artificial Neural Networks in Image Processing V. San Jose, California, 2001. [7] Darrel Hankerson, G.A. Harris and Peter D. Johnson, Jr. Introduction to information theory and data compression. CRC press, 1998 [8] http://djvu.research.att.com/home.html [9] Chang W., H.S. Soliman et al. Preserving visual perception by learning clustering. ISCA: 1993 International Conference on Parallel and Distributed Computing and Systems, Louisville, Kent. 1993 [10] Chang W., H.S. Soliman et al. A learning vector quantization neural model for image data compression. IEEE Data compression conference. 1994, Snowbird, Utah. [11] T. Kohonen, Kangas et al. LVQ_PAK: a program package for the correct application of learning vector quantization algorithms, IEEE International Joint Conference on Neural Networks 1992. [12] Erwin E., K. Obermayer et al. Self-organizing maps: ordering, convergence properties and energy functions. Biological Cybernetics 67: 1992 p35-45.

Divide the image into vector subimages, and reset the codebook occupied entries counter (COE=0). Set L to be the maximum number on entries in the codebook. Present an input vector subimage S Among the COE existing centroids, find the possible-winning-centroids set, PWCS, with an acceptable pixel differences with S, within the thresold Is PWCS empty? Yes COE L? Yes No Among the PWCS, and based on the Euclidian distance method, find j the index of the closest centroid to S. Consider the entire codebook to be PWCS. N Add S to the j s cluster. update its center of mass Cj to be the average of the new-formed cluster, only if the cluster is still of reasonable size (less than MaxTrainSet). Add new cluster j (COE=COE+1), with S as its centroid. Save j into the indices table, to be the representative index of S. Yes More subimages to train? No Construct the compressed file by appending the codebook entries to the end of the indices file, and zip it using a run length compression technique (LZW). Figure 1: DC training-compression flowchart.

Input vector subimages domain Sequentially presenting subimages Cluster Searching Searching Results Training Adding a Cluster (if need) Compression Saving (in order) the index of subimage s cluster Codebook Indices Figure 2: Training-compression phase. Load the compressed file, unzip it using LZW, and extract the sequence of indices and the codebook table. Concatenate, and compress using LZW Compressed File Point at the beginning of the indices sequence Get the next centroid index i Use i to look up centroid C i in the codebook table. Place, in order, the obtained C i in the decompressed image file, replacing the original corresponding subimage S i. Yes More indices to process? No End of the decompression process. Figure 3: DC decompression flowchart. Decompress using LZW, and deconcatenate Compressed File Indices Codebook Sequentially extracting indices Lookup an index Get the corresponding centroid Decompression Saving (in order) the extracted centroids Reconstructed Image Figure 4: Decompression phase.

L = 33 L = 256 L = 1024 MaxIntDif = 0 MaxIntDif = 50 MaxIntDif = 200 MaxTrainSet = 1 MaxTrainSet = 10 MaxTrainSet = 200 Figure 5: Quality changing based on different parameters values.

Gzip Comp. Ratio: 950.96:1 DjVu Comp. Ratio: 2,125.53:1 DC Comp. Ratio: 11,565.40:1 JPEG Comp. Ratio: 53.10:1 Quality: Foggy DjVu Comp. Ratio: 55.75:1 Quality: Foggy DC Comp. Ratio: 389.52:1 Quality: Lossless Figure 6: Background compressions using DC, JPEG, DjVu and Gzip.

Image DjVu DC Comp. Ratio Quality Comp. Ratio Quality Cdoc4.ppm 58.70 Good 73.76 32.58 Cdoc6.ppm 82.63 Good 89.45 32.83 Cdoc7.ppm 63.19 Good 65.70 34.06 Cdoc10.ppm 94.06 Good 164.74 34.65 Cdoc12.ppm 58.15 Good 156.79 Lossless Cdoc15.ppm 50.71 Foggy 54.28 34.78 Cdoc19.ppm 83.30 Good 91.85 36.67 Cdoc21.ppm 88.38 Good 92.19 34.22 Cdoc22.ppm 48.96 Good 70.46 34.02 Cdoc23.ppm 120.78 Good 122.01 36.76 Cdoc24.ppm 84.40 Good 85.81 31.53 Cdoc26.ppm 41.06 Foggy 80.15 31.99 Cdoc27.ppm 189.36 Good 192.05 31.75 Cdoc32.ppm 48.46 Foggy 53.63 34.27 Cdoc39.ppm 77.31 Good 136.89 32.83 Cdoc41.ppm 28.99 Foggy 74.50 32.20 Cdoc42.ppm 95.03 Foggy 207.23 31.44 Cdoc43.ppm 36.06 Good 61.29 31.30 Cdoc44.ppm 87.30 Good 118.37 31.53 Cdoc45.ppm 51.15 Foggy 53.57 30.33 Cdoc47.ppm 58.27 Foggy 70.25 31.58 Cdoc48.ppm 28.30 Foggy 61.72 31.36 Table 1: Compression results using DC and DjVu.

Cdoc10.ppm (Original) Compressed with DC Compressed with DjVu Cdoc12.ppm (Original) Compressed with DC Compressed with DjVu Cdoc29.ppm (Original) Compressed with DC Compressed with DjVu Figure 7: Examples of decompressed documents using DC and DjVu.