A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION

A LOSSLESS INDEX CODING ALGORITHM AND VLSI DESIGN FOR VECTOR QUANTIZATION Ming-Hwa Sheu, Sh-Chi Tsai and Ming-Der Shieh Dept. of Electronic Eng., National Yunlin Univ. of Science and Technology, Yunlin, Taiwan, R.O.C. E-mail:sheumh@cad.el.yuntech.edu.tw Abstract This paper presents a switching-tree coding (STC) algorithm to re-encode the output codevector indexes after vector quantization. Based on the connections in index neighborhood, we construct three binary trees to allocate the optimal variable-length noiseless code for each index. Simulation results indicate that this algorithm can improve coding efficiency without introducing any extra coding distortion, as compared to conventional memoryless VQ. Besides, according the new algorithm, an efficient VLSI architecture is also derived under the requirements of low cost and high performance. The gate counts of encoder and decoder are about 5000 and 4800 respectively. After Verilog simulation, the clock rate of the whole architecture is 50MHZ by using 0.6um CMOS 1P3M technology. - 1 -

I. Introduction Vector quantization (VQ) is one of important techniques for image compression. Due to high compression ratio, VQ is widely used in various applications such as multimedia, high definition television, teleconferencing systems, and image data based management [1, 2]. Basically, VQ is one of source coding methods that maps each image block of pixels into the corresponding index based on the codebook. The transmitting index has much fewer bits than the transmitting image block. Therefore, there are various schemes have been proposed to improve the VQ performance which includes computation complexity, compression ratio and picture quality. Almost of the researchers focus on the studies of the codebook generation and vector encoding search. The LBG algorithm [3] and Kohonen Self-Organizing Maps [4] are two common schemes for codebook generation. They preprocess and classify many training image to form a set of k-dimensional codewords. For vector encoding search, the input image is first divided into a set of k-dimensional vectors (or blocks). Next, the vector is used to find the best match codeword in the existing codebook based on the minimum Euclidean distance criteria. Then, the index (address) of the best match codeword is transmitted instead of the input vector. However, it always has enormous mathematical computations when finding the best match codeword from a given codebook. To speedup the search procedure, the algorithms such as TIE [5] and DTPC [6] etc. presented the fast methods for vector encoding search based on extra precalculated data. Besides, MPS [7], ENNS [8], and DAM [9] etc proposed the efficient algorithms to improve both codebook generation and vector encoding search. In VQ system, the high quality picture will depend very strongly on the large codebook demanded. On the other hand, the larger codebook would get lower compression ratio and expand the more encoding operations. In order to improve coding efficiency, a lossless re-encoding process can be added in traditional VQ system as shown in Fig. 1. IGA [10] considered the high correlation between neighboring codevector indexes, and first proposed an effective group-search scheme to re-encode the same index. However, IGA is not suitable for VLSI implementation, since the operation of group-search is not very regular. Also, it requires extra huge memory space in both encoding or decoding operations. For instance, if the codebook size is 256 and an image is 512*512, then IGA requires 4M-bit (128*128*256) memory space to save the status map. In this paper, we propose a STC algorithm which can immediately re-encode/decode the sequencing input index with only using 1.5K-bit memory space. Then, based on the developed algorithm, a low cost VLSI architecture is deduced and implemented. - 2 -

Fig. 1 Lossless coding process II. The Switching Tree Coding Algorithm In memoryless VQ, each index is sequentially and independently mapped from the small block (4*4 pixels) which is divided form an input image. The neighbor indexes still inherit the same correlation property that exists in the corresponding adjacent blocks. As a result, many neighbor blocks may be quantized into the same index. So, four relations can be found in a tow-dimensional index map. 1. Upper-connection: the current input index is the same as the upper index. 2. Left-connection: the current input index and the left index are identical. 3. Around-connection: the current index is identical to the around index except the upper and left indexes. 4. Disconnection: the current index are fully different to all indexes in the neighbor area. Fig. 2 (a) is shown an index map where CI represents the current input index, and Fig. 2 (b) is displayed the four connections for different values of CI. Fig. 2 The correlation for the current input index - 3 -

Basically, the bit-width of the index depends on the codebook size. For a codebook with size of 256, the bit-width of the index is 8 (log 2 256). To compact the index data, we propose a switching-tree coding based on the above connections. First, let us introduce three connection trees as shown in Figs. 3 (a), (b), and (c). These trees have different coding results according to 4 types of connection. In Fig. 3 (a), if CI=UI (or LI), then the current input index has up(or left)-connection and is encoded as 11 (or 10). If CI belongs to around connection, the combination of the prefix code 01 and the around-search code (about 5-bit) is used to stand for the CI. The neighbor-search code will be stated in following section. For the disconnection, the prefix code 00 is added to the code of CI. The other trees also have the similar coding operations. In around connection, the current input index has to compare the previous indexes in nearby area as shown in Fig. 4. These previous indexes can be stored in the memory buffer with 32 location according to the arrow order. In order to extend the search area, all content in memory are unlike. Consequently, only one of the same indexes can be saved in the shift-register. When the CI finds the match index in the memory buffer, the CI would be encoded in the corresponding address code and plus prefix code in connection tree. While CI is to be coded, we need to check two neighbor flags, called U-flag and L-flag to decide the proper coding tree. Both L-flag and U-flag represent encoding length for the left index and up index respectively. The flag is shown in Table 1. Therefore, we should use a flag buffer to keep the flags for previous encoded indexes. The size of flag buffer is only M (where M*M is the size of the index map). Then, comparing two neighbor flags results in three cases. Case 1: U-flag = L-flag =3, Tree A is selected. Case 2: U-flag > L-flag, or U-flag = L-flag =2, Tree B is selected. Case 3: U-flag < L-flag, or U-flag = L-flag =1, Tree C is selected Based on the above mention, the whole switching tree coding algorithm is depicted in Fig. 5. Using this algorithm, Table 2 is shown the compression bit rate in different search memory size based on a 512*512 image and a coodebook of 256 codevectors. - 4 -

- 5 -

Fig. 3 Three switching binary trees Fig.4 The encoding of search memory - 6 -

III. The VLSI Architecture Design Based on our STC algorithm, the VLSI architecture for index encoding and decoding is designed and shown in Fig. 6. Next, in Fig. 7, we exhibit the simulation for indexes transmitted from the encoder to the decoder. The result shows that our proposed architecture works at clock rate of 50MHz and is enough to handle a real-time image sequence with the frame size of 1024 1024 pixels, 30 fram/sec. The physical layout of our prototype VLSI chip has been implementing. Below is the summary of our VLSI hardware. ㆍFlag latch size: 128*2bits; ㆍShift register: 128*8 bits ㆍSearch memory size: 32 correlated indexes; ㆍGate count: about 9,800 (encoder + decoder); ㆍDesign by cell-based approach. IV. Conclusion In this paper, we present a STC algorithm that can compacts the VQ index to achieve 0.32 bit/pixel in average. In this algorithm, four types of index connections are considered, and then three binary trees are used to re-encode data alternatively. This algorithm possesses regular operation and low memory space such that it is very suitable for VLSI implementation in costless. After hardware simulation, our hardware architecture can support real time application for 1024*1024 image frame. Because the VLSI architecture has low cost and high performance, it can be embedded in traditional VQ hardware system. References [1] Y. Linde, A. Buzo, and R. M. Gray, "An algorithm for vector quantizer design," IEEE Trans. Commun., vol. COM-28, pp. 84, 1980. [2] N. M. Nasrabadi and R. B. King, "Image Coding Using Vector Quantization: A Review", IEEE Trans. Commun., vol. 36, no. 8, pp. 957-971, Aug. 1988 [3] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, New York: Kluwer Academic, 1991. [4] N.M. Nasrabadi and Y. Feng, "Vector quantization of images based upon the Kohonen self-organizing feature maps,: IEEE 1st Int. Conf. Neural Net., 1989. - 7 -

[5] C. M. Huang, Q. Bi, G. S. Stiles, and R. W. Harris, Fast full-search equivalent encoding algorithms for image compression using vector quantization," IEEE Trans. Image Processing, vol.1, no. 3, pp. 413-416, 1992. [6] C. C. Chang, D. C. Lin, and T. S. Chen, "An improved VQ codebook search algorithm using principal component analysis, J. Visual Commun. Image [7] V. Ramasubramanian and K. Paliwal, "Fast K-dimensional tree algorithms for nearest neighbor search with application to vector quantization encoding," IEEE Trans. Signal Processing, vol. 40, no. 3, pp. 518-531, 1992. [8] S. W. Ra and J. K. Kim, A fast mean-distance-ordered partial codebook search algorithm for image vector quantization," IEEE Trans. Circuits Syst. 1, vol. 40, no. 9, pp. 576-579, 1993. [9] L. Guan and M. Kamel, "Equal-average Hyperplane partitioning method for vector quantization of image data," Pattern Recognition Lett., vol. 13, pp. 693-699, 1992. [10] C. H. Hsieh, J. C Tsai, and P. C. Lu, "Moiseless Coding of VQ Index Using Index Grouping Algorithm," IEEE Trans. On Communications, VOL. 44, NO. 12 December 1996. Table 1: Coding flag Flag Coding length 1 1-bit 2 2-bit 3 > 2-bit Table. 2 Bit rates for 512*512 images A: search memory size = 8 correlated indexes; B: search memory size = 16 correlated indexes; C: search memory size = 32 correlated indexes; - 8 -

Fig. 5. The STC algorithm - 9 -

Fig. 6 VLSI architecture for VQ index encoder/decoder Fig. 7 Gate-level simulation result - 10 -