Implication of variable code block size in JPEG 2000 and its VLSI implementation

Implication of variable code block size in JPEG 2000 and its VLSI implementation Ping-Sing Tsai a, Tinku Acharya b,c a Dept. of Computer Science, Univ. of Texas Pan American, 1201 W. Univ. Dr., Edinburg, TX USA 78539-2999; b Avisere Inc., Tucson, AZ USA 85710; c Dept. of Electrical Engineering, Arizona State University, Tempe, AZ USA 85287; ABSTRACT JPEG 2000 is the new standard for image compression. The features of this standard makes it is suitable for imaging and multimedia applications in this era of wireless and Internet communications. Discrete Wavelet Transform and embedded bit plane coding are the two key building blocks of the JPEG 2000 encoder. The JPEG 2000 architecture for image compression makes high quality compression possible in video mode also, i.e. motion JPEG 2000. In this paper, we present a study of the compression impact using variable code block size in different levels of DWT instead of fixed code block size as specified in the original standard. We also discuss the advantages of using variable code block sizes and its VLSI implementation. Keywords: image compression, JPEG 2000, rate-distortion optimization, variable code block size, VLSI, 1. INTRODUCTION JPEG 2000 is the new standard 1, 2, 3 for image compression. Although JPEG (Joint Photographic Experts Group) standard 4, 5 for image compression has been successful for more than a decade, it lacks many features desired by interactive multimedia applications in current era of information technology and multimedia communication 6. A fundamental shift in modern image compression approach came after the discrete wavelet transform (DWT) became popular because of its multi-resolution approach and many desired features 7. Many scalable image compression algorithms based on DWT were proposed in the literature 8, 9, 10, 11. DWT is the basis of the JPEG 2000 standard. The systems architecture for the standard is not only optimized for compression efficiency at very low bit-rates, it is also optimized for scalability and interoperability in networks and noisy mobile communication environments. It was also developed to have desirable functionalities such as progressive transmission, scalability, region of interest coding, random access, and error resilience. In fact, most state-of-the-art technologies of still image compression were integrated into the JPEG 2000 standard. Even though the basic key encoding modules of JPEG 2000 such as discrete wavelet transformation, quantization, bit plane coding and binary arithmetic coding are clearly specified, some implementation was still left to individual developers. Among these remaining, rate-distortion optimization plays a key role in JPEG 2000 implementation. In this section, we will briefly give an overview of the JPEG 2000 standard. 1.1 Overview of the Core Coding System of the JPEG 2000 Standard JPEG 2000 Part I defines the core coding system of the JPEG 2000 standard. In Fig. 1, we show the functional block diagram of the JPEG 2000 core coding system. We show the basic dataflow diagram in Fig. 2 with the modification of variable code block size instead of the fixed size which is the central theme of this paper. As shown in Fig. 2, the image components can be divided into rectangular s to make it suitable to compress very large size images. DC level shifting is performed on these components followed by either an irreversible or reversible component transformation. The component transformation helps improve compression performance. Each component of a is independently pstsai@ieee.org; phone 1 956 292-7229; fax 1 956 384-5099; cs.panam.edu

transformed by the Discrete Wavelet Transformation (DWT) 3, 8. In JPEG 2000, the 9/7 irreversible wavelet transformation is used for lossy compression and the 5/3 reversible lifting based wavelet transform is specified for lossless compression. Uniform scalar quantization with dead-zone at the origin is applied to the samples in subbands at the wavelet domain for lossy compression. The quantization step size can be determined by the dynamic range of the samples in a subband. After quantization, each subband is divided into non-overlapping rectangular blocks, called code blocks. Code blocks are the basic coding unit for entropy and each code block is encoded independently. Size of a code block is typically 32 32 or 64 64. The entropy encoding of each code block in JPEG 2000 standard is done bit-plane by bit-plane from most significant bit place to the least significant bit plane of the code block. However, the constraint of fixed code block size is due to the fact that the coding style default (COD) and/or coding style component (COC) marker segments in a JPEG 2000 compressed file only can carry one set of code block width and height information for an image or a or a component. A unique feature of JPEG 2000 is region of interest (ROI) coding which allows different regions of an image to be coded with different fidelity criteria. The MAXSHIFT method proposed by Christopoulos et al. 12, 13 is adopted by the JPEG 2000 part 1 standard. The entropy encoding in JPEG 2000 consists of a fractional bit plane coding (BPC) and binary arithmetic coding (BAC) 1, 2, 10. The combination of BPC and BAC is also referred to as Tier 1 coding in the standard. BPC has three passes in each bit plane: Significance Propagation Pass (SPP), Magnitude Refinement Pass (MRP), and Cleanup Pass (CUP). Each pass generates context models and the corresponding binary data, to be encoded by BAC to produce the compressed bit stream. So each coding block has an independent bit stream. These independent bit streams of all the code blocks are combined into a single bit stream using Tier 2 coding which is based on the result of rate-distortion optimization. An efficient rate-distortion algorithm provides possible truncation points of the bit streams in an optimal way to minimize distortion according to any given target bit rate. However, in order to obtain an optimal solution such as the EBCOT (Embedded Block Coding with Optimized Truncation) method proposed by D. Taubman 10, 14, one will need to buffer up all the bit streams from all code blocks. This is expensive for a VLSI implementation of the JPEG 2000 architecture. In Tier 2 coding, these independent bit streams generated in Tier 1 coding are multiplexed to compose the final compressed output bit stream. It also efficiently generates the header information to indicate ordering of the resulting coded blocks and corresponding coding passes. Rate Control Source Image Forward MultiComponent Transformation Forward Wavelet Transform Quantization Tier 1 Encoder Tier 2 Encoder Coded Image 1.2 A Systems Architecture for VLSI Implementation Region of Interest Fig. 1. Block diagram of JPEG 2000 encoder. Needless to say, the DWT, BPC and BAC are the computational and memory intensive blocks of the JPEG 2000 standard. Of these blocks, DWT is very symmetrical in nature and can be handled either by dedicated hardware, DSP processors or even general purpose processors 1, 15. In contrast, both BPC and BAC are very much control intensive and have to be performed in a sequential fashion. Furthermore, memory interactions are substantial in EBCOT implementation of BPC. In order to simplify the control mechanism for VLSI implementation and efficient memory access, we studied the usage of variable size code block in different DWT resolutions and its implications.

The block diagram of a systems architecture published earlier 1, 11, 16 for VLSI implementation is shown in Fig. 3. The architecture consists of a DWT coder and three pairs of BPC and BAC coders. It also consists of six memory blocks: three subband MEM blocks between the DWT coder and three BPC coders and three CXD buffers between the BPC and BAC coders. A global controller is required to control the interactions between all these blocks. The data flow of the architecture is as follows. DWT is applied on the image to generate the three subbands at each level. The LL subband data is written back to the DWT block for the next level of decomposition. The data from the code blocks (formed from the quantized subbands) is written into subband MEMs. Each BPC reads the data from the corresponding subband MEM and writes the context-data pairs into the corresponding CXD buffer. BAC reads from the CXD buffer and generates the code stream for each code block. At the last level, the LL subband is entropy coded using the HL entropy coder pair. ROI Variable Size Code Block Subbands Subband Subband Subband Component of an image DWT + (Q) Subbands Subband Subband Subband Subbands Subband Subband Subband Subbands Subband BPC + BAC bit streams formation + bit - rate control Subband Subband Fig. 2. JPEG 2000 Encoder Dataflow Diagram with the modification of variable size code block. In this architecture shown in Fig. 3, three sets of BPC and BAC pairs are needed to handle the large number of computations during entropy coding. For instance to encode a N N code block, with one bit position being coded in each cycle, N N P 3 cycles are required. This is because the internal precision is assumed to be P bits for lossless performance (usually it is 16) and BPC performs the coding in 3 passes. On top of this the BAC requires at least two table lookups and two additions per bit. The entropy coding process can be easily implemented in parallel as the code blocks are entropy coded independently. That s why three sets of coders have been used in this architecture to perform the entropy coding of the subbands in parallel. Further parallelization is possible in coding individual code blocks. But this level of parallelization does not help much when the size of the code blocks becomes small, at higher levels of DWT. The working principles and underlying logic of the architecture, memory organization, specialized buffer logics, state machine logic for the control circuit to manage all the basic modules, timing analysis, and synthesis results have been covered in great detail in Acharya, et al. 1, 11. We avoid detail description of this architecture because of lack of space.

Global Controller LL subband HL bit-plane subband MEM BPC 1 CXD buffer 1 context, data BAC 1 bit stream Image DWT LH bit-plane subband MEM BPC 2 CXD buffer 2 context, data BAC 2 bit stream HH bit-plane subband MEM BPC 3 CXD buffer 3 context, data BAC 3 bit stream Fig. 3. Systems architecture for JPEG 2000 encoder. 2. CONSTRAINT OF CODE BLOCK SIZE IN JPEG 2000 Code block () is the basic unit of data for the entropy encoding in JPEG 2000 standard. The code block width and height of the JPEG 2000 standard are limited to the powers of two in the range of 2 2 and 2 10. All the code blocks within the same of a component need to have the same width and height according to the JPEG 2000 standard. The width and height information of a code block are specified in either the coding style default (COD) and/or coding style component (COC) marker segments. However, the constraint of fixed code block size is due to the fact that COD and/or COC marker segments in a JPEG 2000 compressed file only can carry one set of code block width and height information for an image or a or a component. Let us assume that the selected size of the code block is x y. The same size is applied to all the DWT subbands at all resolutions of subband decomposition as shown in Fig. 4(a) as an example. The 0 in resolution 0 (2LL subband) corresponds to 1 in 2HL subband, 2 in 2LH subband and 3 in 2HH subband in resolution 1, and code blocks 4 15 in 1HL, 1LH, and 1HH subbands in resolution 2 as shown in Fig. 4(a). They are encoded in the order of 0 15 to compress the corresponding spatial region in the image and arranged in the compressed bit stream. The JPEG 2000 Part 1 amendments introduce Profile-0 and Profile-1 that restrict the set of possible values for the coding parameters and options for code block size. The code block size is restricted to either 32 32 or 64 64 for Profile-0 and 64 64 or smaller for Profile-1. The fixed code block size with definitions of packet, layer, resolution, component, and precinct (see references 1 and 2 for detail definitions) provides the flexibility of different progression bit stream orders. However, in order to keep track of all the code blocks corresponding to the same spatial location of an image, one will need to implement a quad-tree type data structure. Such implementation overhead, especially for a hardware based implementation, is not suitable for a close-loop type encoder-decoder system.

0 1 4 5 2LL 2HL 6 7 0 1 4 2 3 1HL 2 3 2LH 2HH 8 9 10 11 12 13 14 15 5 6 1LH 1HH (a) Fig. 4. Code blocks (a) fixed size as specified in the standard, (b) variable size at different DWT resolutions. (b) Fig. 5. A close-loop type encoder-decoder system. 3. VARIABLE CODE BLOCK SIZE AT DIFFERENT DWT RESOLUTIONS We studied the implication of using variable size code blocks at different DWT resolutions as an alternative to the fixed size code block prescribed in JPEG 2000 standard. For a close-loop type encoder-decoder system, we do not need to use the marker segments to pass the code block size information between the encoder and the decoder. The block diagram of a close-loop type encoder-decoder system can be depicted as shown in Fig. 5. As a result, use of the variable code block size makes the control mechanism, memory handling, bit-rate control, etc. simpler.

As illustrated in Fig. 4(b), let us assume that the size of the code block in 2LL subband (say 0) is x y. Since the dimension of 2HL, 2LH, and 2HH are same as the dimension of 2LL, the size of the corresponding 1, 2, and 3 will also be x y. Since the dimension of the immediate lower resolution subbands 1HL, 1LH and 1HH are twice the size in row and column, we explored the impact of code block size 2x 2y in this resolution, and proportionally increase the size of the code blocks in further lower resolutions. As a result, size of 4, 5, and 6 will be 2x 2y. Our study shows that there are several advantages to this approach. They are as follows: The concept of variable code block size as described above ensures the same number of code blocks in different DWT subbands at all the resolutions. This makes the control of the related code blocks for a particular spatial location easier. Instead of using a quadtree type data structure, we can keep track of all the code blocks that correspond to the same spatial location of an image using a simple linear type data structure. This makes both hardware and software based implementations simpler. The control unit of the VLSI architecture becomes easier due to equal number of code blocks in every subband and it also reduces number of buffers to store the code blocks temporarily without changing the size of total memory required. The memory access pattern becomes very simple, and the related code blocks for the same spatial location can be put into the same memory bank in more compact form. For the rate-control, we need to deal with less number of individual bit-stream units (for each independent code block) and rate-distortion optimization algorithm will require less computation because it needs to deal with less number of data points. Usually, the rate-distortion optimization is implemented by a microcontroller in the VLSI architecture for hardware implementation. This may facilitate to use lower cost and lower power microcontroller in the hardware architecture. For a close-loop type encoder-decoder system, we do not need to use the marker segments (COC or COD) to pass the code block size information between encoder and decoder 17. Since there is a predefined set of code block sizes for different DWT resolutions in the system, the encoder and decoder can both function normally. There is no or little impact in compression efficiency to use the variable size code block compared to the fixed size code blocks prescribed in the standard. We have tested the impact of variable size code blocks in terms of compression efficient with different images. We show the comparison compression efficiency for a typical image (BIKE image, Fig. 6) in Fig. 6 for fixed size code block and variable size code blocks. We used 4 levels of DWT decomposition. The code block sizes 64 64, 32 32, 16 16, and 16 16 were used for DWT level 1, level 2, level 3, and level 4 respectively. We have indicated the related compression efficiency (in terms of compressed size of the image) by the label Dynamic 1 in the corresponding graph in Fig. 7. We have labeled the graph Fixed 32 32 to show the relative compression efficiency when fixed size 32 32 code blocks has been used in all DWT resolutions levels. The graph labeled with Fixed 64 64 indicates the performance when the fixed size 64 64 code blocks used in all DWT resolutions. In addition to above proposed variable size code blocks, we also experimented with another possible variable size with different height and width of the code blocks. In this case, we used code block sizes 64 64, 64 32, 64 16, and 64 8 for 4 different DWT levels. The performance of the compression efficiency with this kind of variable size code block is shown by the graph labeled Dynamic 2 in Fig. 7. The compression efficiency in terms of different code blocks with some other interesting images has been analyzed in a separate paper 17.

Fig. 6. The BIKE test image. 300.00 250.00 K-bytes 200.00 150.00 100.00 50.00 0.00 LL4 HL4 LH4 HH4 HL3 LH3 HH3 HL2 LH2 HH2 HL1 LH1 HH1 Sub-bands Fixed 64x64 Fixed 32x32 Dynamic 1 Dynamic 2 Fig. 7. Compressed size comparison of Bike image using different fixed and variable code block sizes.

4. CONCLUSIONS In JPEG 2000 standard, all the code blocks within the same of a component need to have the same width and height according to the JPEG 2000 standard. The width and height information of a code block are specified in either the coding style default (COD) and/or coding style component (COC) marker segments. However, the constraint of fixed code block size is due to the fact that COD and/or COC marker segments in a JPEG 2000 compressed file only can carry one set of code block width and height information for an image or a or a component. We have studied the implication of using variable code block sizes at different DWT resolution levels instead of the fixed size code blocks prescribed by JPEG 2000 standard. The simulation results show that there is no significant impact on compression efficiency if one uses different code block sizes at different DWT levels compared to the fixed size code block size. Even though the variable code block size is not JPEG 2000 compliant, it can be easily employed for a close-loop type encoder-decoder system. This variable size code blocks at different DWT resolutions offers several benefits both for software and VLSI implementation. We have presented some simulation results and discussed the advantages in this paper. REFERENCES 1. T. Acharya and P. S. Tsai, JPEG2000 Standard for Image Compression Concepts, Algorithms and VLSI Architectures, Hoboken, New Jersey: John Wiley & Sons, Inc., 2004. 2. D. Taubman and M. Marcellin, JPEG2000: Image Compression Fundamentals, Standards and Practice. Boston: Kluwer Academic Publisher, 2002. 3. JPEG 2000 Part I: Final Committee Draft (ISO/IEC FCD15444-1), ISO/IEC JTC1/SC29/WG1 N11855, March 2000. 4. W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data Compression Standard, Van Nostrand Reinhold, New York, 1993. 5. ISO/IEC 10918-1 and ITU-T Recommendation T.81. Information technology digital compression and coding of continuous-tone still images: Requirements and guidelines, 1994. 6. T. Acharya and A. K. Ray, Image Processing Principles and Applications, Wiley Interscience, Hoboken, New Jersey, 2005. 7. S. Mallat, A theory for multi-resolution signal decomposition: The Wavelet representation, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674-693, 1989. 8. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, Image coding using wavelet transform, IEEE Trans. on Image Processing, vol. 1, pp. 205-220, April 1992. 9. A. Said and W. A. Pearlman, A New Fast and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees, IEEE Trans. on Circuits and Systems for Video Technology, 6(3), 243-250, 1996. 10. D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. on Image Processing, vol. 9, no. 7, pp. 1158-1170, July 2000. 11. K. Andra, C. Chakrabarti, and T. Acharya, A High Performance JPEG 2000 Architecture, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 3, pp. 209-218, March 2003. 12. D. Nister and C. Christopoulos, Lossless region of interest with embedded wavelet image coding, Signal Processing, vol. 78, no. 1, pp. 1-17, 1999. 13. C. Christopoulos, J. Askelof, and M. Larsson, Efficient region of interest coding techniques in the upcomingjpeg2000 still image coding standard, in the Proc. of International Conference on Image Processing, ICIP 2000, vol. 2, pp. 41-44. 14. T. Kim, H. Kim, P.-S. Tsai, and T. Acharya, Memory Efficient Progressive Rate-Distortion Algorithm for JPEG2000, IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 1, pp. 181 187, 2005. 15. T. Acharya and C. Chakrabarti, A Survey on Lifting-Based Discrete Wavelet Transform Architectures, The J. of VLSI Signal Processing, vol. 42, no. 3, pp. 321-339, 2006. 16. T. Acharya, VLSI Algorithms and Architectures for JPEG2000, ACM Ubiquity, ACM Press, Vol. 7, Issue 35, 2006. 17. P. S. Tsai and Yann LeCornec, Dynamic Code Block Size for JPEG 2000, going to appear in the proceeding of the Digital Photography IV, IS&T/SPIE Electronic Imaging Symposium, January 26-31, 2008, San Jose, CA.