Unified VLSI Systolic Array Design for LZ Data Compression

Unified VLSI Systolic Array Design for LZ Data Compression Shih-Arn Hwang, and Cheng-Wen Wu Dept. of EE, NTHU, Taiwan, R.O.C. IEEE Trans. on VLSI Systems Vol. 9, No.4, Aug. 2001 Pages: 489-499 Presenter: Liang-Bi Chen

Abstract Hardware implementation of data compression algorithms is receiving attention due to exponentially expanding network traffic and digital data storage usage. In this paper, we propose several serial onedimensional and parallel two-dimensional systolicarrays for Lempel-Ziv data compression. A VLSI chip implementation our optimal linear array is fabricated and tested. The proposed array architecture is scalable. Also, multiple chips (linear arrays) can be connected in parallel to implement the parallel array structure and provide a proportional speedup. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 2/27

Outline What s the problem? Introduction Systolic Algorithm Design Systolic LZ Compressor Design Implementation Conclusion 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 3/27

What s the problem? LZ-based algorithms have been widely implemented with software For example Compress Zoo lha Pkzip arj However, their speed is still too low for real-time application, such as Wireless data networking High speed mass-storage transaction Hence, the hardware implementation is required for on-the-fly compression and decompression. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 4/27

Introduction Data compression techniques play an import role in data network and storage utilization, as well as promotion of portable computing and data communication. Many lossless data compression techniques have been proposed in the past and widely used. Hufferman code Arithmetic code Run-length code Lempel-Ziv compression algorithm 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 5/27

LZ hardware realizations Microprocessor approach[19] It s not attractive for real time application, since it does not fully explore hardware parallelism. Content-addressable memory (CAM) approach [15]-[18] Advantage It has a constant symbol search time. Thereby, it achieves optimal speed for compression. Disadvantage It has high hardware cost. Systolic-array approach [13],[14],[20]-[22] CAM vs. Systolic-Array The CAM approach performs string match by full parallel searching, the Systolic Array approach does it by pipelining. As compared with CAM-based designs, systolic-array compressors are slower, but better in hardware cost and testability. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 6/27

Systolic Algorithm Design The major concept behind the LZ algorithm is the temporal locality in the information. Since the buffer size (n) and match length (Ls) determine not only the compression efficiency but also the optimal mapping direction of the array architecture. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 7/27

The sequential compression algorithm 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 8/27

Simulations of compression ratio with respect to n and Ls The best in here Simulation on various text files. Increasing both n and Ls is not the best way to obtain a high compression ratio. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 9/27

Compression ratio vs. n for different Ls values 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 10/27

Compression ratio vs. Ls for different n values 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 11/27

Systolic LZ Compression Design Dependence Graph (DG) [26] Object: We can achieve the maximum parallelism in an algorithm by carefully studying the data dependencies in the computations. That shows the dependence of the computations that occur in an algorithm and can be consider as a graphic representation of a single assignment code. Form the single assignment code, the DG of the LZ algorithms can be obtained. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 12/27

The single-assignment code To guarantee single assignment, we use an extra index, j. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 13/27

Global DG of the compression algorithm A DG which contains global signals is called a global DG. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 14/27

Localized DG The global DG can be transformed into a localized DG, in which only local communication is involved. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 15/27

Type-1 array [14] 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 16/27

The double buffer 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 17/27

Type-2 Array 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 18/27

The longest match length decision block 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 19/27

On-line buffer updating unit 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 20/27

Interleaved Type-2 (Type-2i) Array 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 21/27

Type-3 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 22/27

Type-4 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 23/27

2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 24/27

Parallel Type-2i Array 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 25/27

Implementation 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 26/27

Conclusions By investigating possible mapping and scheduling directions on the dependence graph, we propose the optimal array structure for LZ compression, which is better than the two recently proposed designs with respect to hardware cost and testability. Parallel arrays obtained form the block transform of the dependence graph can be used to improve the compression rate. It provides a tradeoff of cost and performance between two extremes. 2005/10/5 Unified VLSI Systolic Array Design for LZ Data Compression L. -B. Chen 27/27