A Hybrid Approach to Text Compression

Size: px
Start display at page:

Download "A Hybrid Approach to Text Compression"

Transcription

1 A Hybrid Approach to Text Compression Peter C Gutmann Computer Science, University of Auckland, New Zealand Telephone ; pgut 1 Bcs.aukuni.ac.nz Timothy C Bell Computer Science, University of Canterbury, Christchurch 1, New Zealand Telephone ; fax ; 1 Introduction Text compression schemes have sometimes been divided into two classes: symbolwise methods, which form a source model, typically using a finite context to predict symbols; and dictionary methods, which replace phrases (groups of symbols) in the input with a code. Symbolwise methods tend to give better compression because they form more accurate models of text, while dictionary methods tend to be faster because multiple symbols are coded at once. It is possible to decompose some dictionary methods into equivalent symbolwise methods (Langdon 1983, Bell & Witten in press). The decomposed method gives identical compression performance, but is slower because more coded symbols are transmitted. This decomposition is of interest primarily because it is helpful in making comparisons of the two methods. In this paper we explore a hybrid approach based on the opposite of this decomposition: the predictions of a symbolwise method are grouped together so that several characters can be coded at once. The objective is to combine the good compression of symbolwise methods with the high speed of dictionary methods. The hybrid allows tradeoffs to be made in terms of compression speed, compression performance, and memory usage. More importantly, investigating a hybrid method gives extra insights into the relationship between dictionary and symbolwise methods, and reveals that they are more closely related than might be expected. The primary goal in the design of the hybrid method described here was to create a very fast system that is based on context modelling. We therefore begin by surveying techniques that have been used in the literature to achieve fast compression. The current method of choice for very fast adaptive compressors is to use some variant of the LZ77 method (Ziv & Lempel 1977) in which the extent of the search for repeated strings is limited. In general this is accomplished by terminating the search after a predetermined number of potential matches have been checked. An extreme example of $ IEEE 225

2 226 this is LZRWl (Williams 1991a), which hashes the next few characters of the input into a table of pointers that point back into the sliding window. A new phrase entering the window is added by overwriting any existing phrase that is stored at the same location in the hash table. Consequently, only the most recent occurrence of a phrase is stored, and even this may be lost if another phrase collides with it in the hash table. This very simple replacement strategy achieves very fast compression. The output is packed into 16-bit words to make coding even faster, with 12 bits of position information (corresponding to a window size of 4K characters), and 4 bits of length information. LZRW2 (Williams 1991b) extends LZRWl by storing a table of selected phrases instead of referencing the sliding window directly. The hash table entries point to a phrase table that contains pointers to the sliding window. Since the window size is no longer limited by the hash table size (the phrase table entries can point back an arbitrarily large distance), a much larger window is available, and the index can access 4K phrases instead of 4K characters. The price paid is that the decompressor has the extra overhead of maintaining the same hash table and phrase table used in the compressor, and an extra level of indirection is introduced. LZRW3 is another refinement, which merges the hash and phrase table into one unified lookup table (Williams 1991b). In addition, LZRW3 variants store multiple pointers at each hash table location, with a commensurate decrease in the number of hash buckets so that the table is the same size. Although the reduced number of buckets leads to more collisions, the increased bucket size means that more strings can be searched for a given hash value than in the simpler versions. A bucket size of 4 or 8 phrases seems to be the best tradeoff for an overall table size of 4K entries. Several strategies can be used to decide which entry in a bucket should be overwritten. Methods such as overwriting the least-recently used entry could be applied, but a particularly simple strategy that performs well is to simply overwrite a random entry. Rather than using a random number generator, a single counter can be maintained that is incremented each time a pointer is stored into a bucket. This achieves a combination of cyclic and random overwrite. Schemes based on this idea can outperform the standard Unix compress utility in terms of both compression and (generally) speed, while using an order of magnitude less memory. The compress program only outperforms LZRW3 methods on larger files in which its enormous dictionary is able to contain a more accurate model of the source statistics. Hash tables are currently widely used for Ziv-Lempel methods because they provide very fast searching for prior phrases. The number of references stored at each hash table location can be lited (saving storage and time), or this limit can be applied at search time by simply only searching the first few references (saving time but not storage). Collision resolution can be ignored if desired because the price paid is simply poorer compression, and not failure of the system. Hash tables can also be used for symbolwise methods, in this case to locate information about previous occurrences of contexts (Raita & Teuhola 1987). Again, the speed can be improved by ignoring collisions or limiting the extent of a search, giving a trade-off against the amount of compression.

3 227 2 A hybrid symbolwise/dictionary method - a ~b C d 7 v t r v 7 7 abcdbcabdabddbc already coded I Figure 1: Index to prior text coding position Rather than maintain an explicit data structure of contexts and phrases, our hybrid method keeps a window of previously-coded text (see Figure 1). An index to the window is used to locate the phrases that are available in each context. The index could be a hash table of contexts; however, an even faster approach is to use a straight look-up table using a single symbol as the context. The look-up table contains a maximum of k pointers for each symbol, allowing k phrases to be stored for each context. This means that not all occurrences of a context will necessarily be indexed-for example, the earliest occurrence of the context b in Figure 1 is not indexed. Larger contexts could be used with a lookup table, but the cost in memory increases exponentially with the size, and a hash table would be better for larger contexts because such an index would be very sparse.

4 228 At each coding step, the symbol that has just been encoded (an a in Figure 1) is used as the context. Previous occurrences of the context are located using the index, and the longest sequence of characters that has occurred in that context is located. In the example, the longest previous sequence is the second phrase indexed by a. This phrase is then identified in the output. This can be done using log& bits, since at most k prior contexts are indexed. Typically k is around eight, so only three bits are required to transmit the location. The number of characters that match is then transmitted. If the number of matching characters is zero, then an escape message is sent, and the next character is transmitted explicitly. Decoding is very fast. The decoder maintains the same index structure, and simply copies the appropriate phrase from the current context. If suitable codes are used, this involves very few instructions for each symbol decoded. The above is a very general description of the hybrid method. In the following section we describe some specific implementations. 3 Variations of the hybrid method The main aspects of the hybrid method described previously that are yet to be specified are the codes used for the output, and the method of updating the pointers. There are several components of the output that must be coded. The encoder must identify which of the k previous contexts is to be used. If we assume that each is equally likely then a simple code of logzk bits is appropriate. Variable length codes could be used to favour more recent phrases, although this was not investigated because of the speed penalty of the extra book-keeping required. Coding the length of matches is more critical to compression performance, because shorter matches are considerably more likely than longer ones. One strategy is to limit the length of matches to, say, m symbols, and to code the length in logzm bits. We have also investigated a variable length code for this purpose. The best compression would be obtained by codes generated by Huffman s algorithm from sample length distributions, but this would incur a significant speed penalty. When a context cannot be used for coding because the next symbol has never occurred in that context, an escape symbol must be transmitted. This can be represented by a single bit that is sent at each coding step, although this assumes that the probability of an escape occurring is 50%. Better compression can be obtained by transmitting the length of a match before identifying the phrase, and using a zero length to indicate an escape. If this is represented by a variable length code then a more appropriate length can be used. An advantage of the single-bit flag is that they can conveniently be stored eight to a byte, which admits a faster implementation. An altemative that eliminates the need for escape codes is to always transmit the context symbol explicitly. This means that progress will be made at each coding step even if the match length is zero. Another possibility is for the escape code to switch to a literal mode, where symbols are transmitted explicitly until a second escape code switches back to

5 the context coding mode. This approach has been used for some Ziv-Lempel methods (e.g. Fiala and -ne 1989). but it is not suitable in this situation because escape symbols do not tend to occur in clusters. Another possibility is to use multi-bit flags at each coding step (Fenwick 1993). Such a flag could indicate more than one literal symbol, or could select from more than one representation of phrases. A final possibility is to use a fast statistical coder (such as a table-driven Huffman coder) to represent the encoder output. This approach is used by the more successful Ziv- Lempel compression systems, which use a two-pass Huffman code to represent the output. A two-pass Huffman code gives similar compression to a single pass adaptive one, but is many times faster for both encoding and decoding if a table-driven canonical code is used (e.g. Siemidski 1988). Other methods, such as arithmetic coding, could be used, but these tend to be slower and require relatively complex models to be maintained (Gutmann 1992). The choice of method for updating pointers in the table of contexts also requires a compromise between compression and speed. In initial experiments we stored pointers to the k most recent occurrences of each context. The amount of book-keeping can be reduced by simply overwriting a randomly chosen pointer instead of the oldest one. The probability of consistently overwriting useful pointers is relatively low, and so this approach has relatively little effect on the compression performance. As with LZRW3, a suitable pseudorandom overwrite is achieved by a single counter that cycles through 1 to k to determine which pointer is to be replaced next Performance of the hybrid method In this section we evaluate the effect of the different choices suggested in the previous section. In order to determine how the parameters k (the number of pointers stored for each context) and m (the maximum match length) should be chosen, a simple hybrid compressor was implemented with a one-bit escape code followed by an 8-bit representation of the next character. The k pointers for each context were maintained using the pseudo-random cyclic overwrite. The system was used to compress the files of the Calgary corpus (Bell et al. 1990). Figure 2 shows how the compression performance of this method depends on k and m. The graph shows the average (unweighted) compression over all the files in the corpus. Figure 2 shows that compression improves as the number of phrases stored in each context increases, although the returns are diminishing by the time k = 64. The disadvantage of increasing k is that it causes a corresponding increase in encoding time due to the overall increase in the number of strings to search for matches, and it also requires more memory. If k is large and encoding speed is a problem then a more complex strategy than the simple linear search of the k entries could be used. The compression performance is relatively insensitive to the maximum match length, m, provided that it is greater than 8. The degradation in compression for longer matches could be avoided by using variable length codes for the match lengths, and we explore a simple form of this later.

6 230 Compression (bitsper chartacter) -t k=16 9 k= Marimwn match length (m) Figure. 2: Compression of a simple hybrid coder averaged over the Calgary corpus (k = number of phrases per context) To simplify coding, it is convenient if the phrase identifier and the match length can fit into one byte, that is, log$ + logzm = 8. The best compression that satisfies this constraint occurs when k = 32 and m = 8, where the corpus files are compressed to 4.12 bits per character on average (that is, files are reduced to approximately half their original size). This is remarkably good performance considering the speed and simplicity of the scheme, particularly for decoding. A multi-bit encoding has been evaluated, where a two-bit flag is sent at each coding step. Table 1 shows how the four values of the flag are interpreted. Two of the values correspond to the two values of the one-bit flag used previously; the other two values are used for shorter encodings of literals and codewords respectively. The short literal coding represents characters in six bits. These characters are selected from an adaptive list of the 64 most recently used symbols. The short codeword still has log2k bits to choose the phrase, but has fewer bits to represent the length. This takes advantage of the high frequency of short lengths. I Flag I Use 00 I 8 bits: Literal 6 bits: 64 most recently used ;i I literals Sbits:Index+length 6 bits: Index + short len th Table 1: Interpretation of the two bit flag The size of the flags have been chosen so that they can conveniently be packed into bytes to enable processing to be fast (Fenwick 1993). Four flags are packed into one byte, and they are also stored in the two remaining bits of the 6-bit literals and codewords. Using two-bit flags is equivalent to a two-level coding method for characters and lengths, and so indicates the kind of improvement that can be expected from moving to variable length codes.

7 The use of two-bit flags achieves compression of 3.57 bits per character (bpc) averaged over the Calgary corpus. This compares with 3.83 bpc for the best parameters using singlebit flag. Table 2 shows comparable results achieved by other fast compression methods. I Method I Compression I 23 1 Hybrid, 2-bit flags Compress 2.70 Table 2: Compression performance of fast methods, averaged over the Calgary corpus. The compression of the hybrid method is not quite as good as that of Unix compress, but it has the advantage that the output fits conveniently into bytes and so is able to operate faster. The Gzip method is one of the best Ziv-Lempel based methods currently available, implemented by GNU. The version reported here was set for best compression. It achieves superior compression to the hybrid method at the expense of a more extensive search for matching phrases, and using two passes to generate Huffman codes for the output. The hybrid method described here is very fast for both encoding and decoding. Searching for matches involves evaluating just k matches, where k is typically between 8 and 64. Even faster coding is possible by reducing k. Decoding requires just two indirections to locate the phrase to be copied. Input and output is fast because no codes cross byte boundaries, and are easily inserted and extracted within bytes. The memory requirements of the hybrid method are relatively low. Most of the memory is consumed by the window of prior characters and the index structure. A window of about 8Kbytes is suitable (the experiments reported above used a window of 32K, which is slightly better). If k = 32 then the index uses 16 Kbytes of memory. Analysing the output of the hybrid method reveals that the literal characters (i.e. escapes to zero-order) occur frequently-ften more frequently than coded phrases. Presumably this is because phrases tend to end when a novel character occurs, and so the first character at each coding step is less likely to have occurred in the first-order context. This suggests the possibility of a2ways encoding a literal character, eliminating the need for the escape flag. Initial experiments have indicated that this degrades compression by just a few percent. 5 Conclusion The idea of a hybrid between dictionary and symbol-wise methods has several consequences. It demonstrates that the two approaches are more closely related that might be expected. The implementation described here suggests analogies between the two classes. For example, the escape symbol used by context coders performs a function that in analogous to that of the literal flag used by Ziv-Lempel methods. Characters that are difficult to predict cause a phrase boundary to occur in the hybrid method, indicating a

8 232 correlation between low probability symbols and phrase boundaries in Ziv-Lempel coders. The index that is used to locate previous occurrences of phrases for a Ziv-Lempel method is closely related to the data structure used by a context-based method for keeping statistics about the contexts. Ziv-Lempel coders use several different strategies to determine which phrases will be made available for coding; likewise, symbolwise models must determine which contexts are the most useful to store. These issues also get caught up with compromises caused by the choice of data structure used, such as a hash table, a trie (digital search tree), or a simple look-up table. These relationships raise the possibility of a very general model that includes the two approaches-and hybrid methods-as special cases. This in tum will help to formalise the continuum of tradeoffs between compression performance, memory usage, and speed. In our investigation we have pursued speed rather than compression performance, and have created a context-based method that is extremely fast. Many other trade-offs between speed and compression performance are possible, and work is continuing on this. For example, using a Huffman code for the output is likely to give significantly better compression. Fast approximate arithmetic coders might also be used for this purpose. Acknowledgements The authors are grateful to T ho Raita and Ross Williams for helpful comments on this work. References Bell, T. C., A unifying theory and improvements for existing approaches to text compression, Ph.D. thesis. Department of Computer Science, University of Canterbury, New Zealand Bell, T.C., Cleary, J.G. and Witten, I.H. (1990) Text compression. Prentice Hall, Englewood Cliffs, NJ. Bell, T. C. and Witten, I. H., The relationship between greedy parsing and symbolwise text compression, J. ACM, in press. Fenwick, P. Ziv-Lempel Coding with Multi-bit Flags, Proceedings of DCC 93, April 1993, p.138. Fiala, E. R. and Greene, D. H., Data compression with finite windows, CO. ACM, (4): p Gutmann, P. Practical Dictionary/Arithmetic Data Compression Synthesis, University of Auckland MSc thesis, February Langdon. G. G., A note on the Ziv-Limpel model for compressing individual sequences, JEEE Transactions on Information Theory, (2): p

9 233 Raita, T. and Teuhola, J., Predictive text compression by hashing, Proceedings of the Tenth Annual International ACMSIGIR Conference, New Orleans, Raita, T., (1987) Generalized Coding Algorithms for Predictive Text Compression, Report AY7, Department of Computer Science, University of Turku, Finland. Siemihski, A. Fast Decoding of Huffman Codes, Information Processinghtters, Vo1.26, No.5 (January 1988), p.237. Williams R.N. (1991a) An Extremely Fast Ziv-Lempel Data Compression Algorithm, Proceedings of DCC 91, p.362. Williams R.N (1991b). Notes on the LZRW3 Algorithm, posted to the Usenet comp.compression newsgroup in June Ziv, J. and hmpel, A., A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, IT-23(3): p

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

A Fast Block sorting Algorithm for lossless Data Compression

A Fast Block sorting Algorithm for lossless Data Compression A Fast Block sorting Algorithm for lossless Data Compression DI Michael Schindler Vienna University of Technology Karlsplatz 13/1861, A 1040 Wien, Austria, Europe michael@eiunix.tuwien.ac.at if.at is transformed

More information

Comparative Study of Dictionary based Compression Algorithms on Text Data

Comparative Study of Dictionary based Compression Algorithms on Text Data 88 Comparative Study of Dictionary based Compression Algorithms on Text Data Amit Jain Kamaljit I. Lakhtaria Sir Padampat Singhania University, Udaipur (Raj.) 323601 India Abstract: With increasing amount

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual

More information

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods R. Nigel Horspool Dept. of Computer Science, University of Victoria P. O. Box 3055, Victoria, B.C., Canada V8W 3P6 E-mail address: nigelh@csr.uvic.ca

More information

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No An Order-2 Context Model for Data Compression With Reduced Time and Space Requirements Debra A. Lelewer and Daniel S. Hirschberg Technical Report No. 90-33 Abstract Context modeling has emerged as the

More information

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

An Asymmetric, Semi-adaptive Text Compression Algorithm

An Asymmetric, Semi-adaptive Text Compression Algorithm An Asymmetric, Semi-adaptive Text Compression Algorithm Harry Plantinga Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 planting@cs.pitt.edu Abstract A new heuristic for text

More information

Integrating Error Detection into Arithmetic Coding

Integrating Error Detection into Arithmetic Coding Integrating Error Detection into Arithmetic Coding Colin Boyd Λ, John G. Cleary, Sean A. Irvine, Ingrid Rinsma-Melchert, Ian H. Witten Department of Computer Science University of Waikato Hamilton New

More information

arxiv: v2 [cs.it] 15 Jan 2011

arxiv: v2 [cs.it] 15 Jan 2011 Improving PPM Algorithm Using Dictionaries Yichuan Hu Department of Electrical and Systems Engineering University of Pennsylvania Email: yichuan@seas.upenn.edu Jianzhong (Charlie) Zhang, Farooq Khan and

More information

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3 Network Working Group P. Deutsch Request for Comments: 1951 Aladdin Enterprises Category: Informational May 1996 DEFLATE Compressed Data Format Specification version 1.3 Status of This Memo This memo provides

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

Network Working Group. Category: Informational August 1996

Network Working Group. Category: Informational August 1996 Network Working Group J. Woods Request for Comments: 1979 Proteon, Inc. Category: Informational August 1996 Status of This Memo PPP Deflate Protocol This memo provides information for the Internet community.

More information

A Comparative Study Of Text Compression Algorithms

A Comparative Study Of Text Compression Algorithms International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College

More information

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,

More information

A Simple Lossless Compression Heuristic for Grey Scale Images

A Simple Lossless Compression Heuristic for Grey Scale Images L. Cinque 1, S. De Agostino 1, F. Liberati 1 and B. Westgeest 2 1 Computer Science Department University La Sapienza Via Salaria 113, 00198 Rome, Italy e-mail: deagostino@di.uniroma1.it 2 Computer Science

More information

Dictionary selection using partial matching

Dictionary selection using partial matching Information Sciences 119 (1999) 57±72 www.elsevier.com/locate/ins Dictionary selection using partial matching Dzung T. Hoang a,1, Philip M. Long b, *,2, Je rey Scott Vitter c,3 a Digital Video Systems,

More information

Simple variant of coding with a variable number of symbols and fixlength codewords.

Simple variant of coding with a variable number of symbols and fixlength codewords. Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length

More information

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications

More information

You can say that again! Text compression

You can say that again! Text compression Activity 3 You can say that again! Text compression Age group Early elementary and up. Abilities assumed Copying written text. Time 10 minutes or more. Size of group From individuals to the whole class.

More information

ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS

ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS RADU RĂDESCU, ANDREEA HONCIUC *1 Key words: Data compression, Splay Tree, Prefix, ratio. This paper presents an original

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Shmuel T. Klein 1, Tamar C. Serebro 1, and Dana Shapira 2 1 Department of Computer Science Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il, t lender@hotmail.com 2 Department of Computer Science

More information

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression Design and Implementation of FPGA- based Systolic Array for LZ Data Compression Mohamed A. Abd El ghany Electronics Dept. German University in Cairo Cairo, Egypt E-mail: mohamed.abdel-ghany@guc.edu.eg

More information

Code Compression for DSP

Code Compression for DSP Code for DSP Charles Lefurgy and Trevor Mudge {lefurgy,tnm}@eecs.umich.edu EECS Department, University of Michigan 1301 Beal Ave., Ann Arbor, MI 48109-2122 http://www.eecs.umich.edu/~tnm/compress Abstract

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Modeling Delta Encoding of Compressed Files EXTENDED ABSTRACT S.T. Klein, T.C. Serebro, and D. Shapira 1 Dept of CS Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il 2 Dept of CS Bar Ilan University

More information

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY S SENTHIL AND L ROBERT: TEXT COMPRESSION ALGORITHMS A COMPARATIVE STUDY DOI: 10.21917/ijct.2011.0062 TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY S. Senthil 1 and L. Robert 2 1 Department of Computer

More information

Error Resilient LZ 77 Data Compression

Error Resilient LZ 77 Data Compression Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single

More information

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University

More information

FPGA based Data Compression using Dictionary based LZW Algorithm

FPGA based Data Compression using Dictionary based LZW Algorithm FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,

More information

Compression by Induction of Hierarchical Grammars

Compression by Induction of Hierarchical Grammars Compression by Induction of Hierarchical Grammars Craig G. Nevill-Manning Computer Science, University of Waikato, Hamilton, New Zealand Telephone +64 7 838 4021; email cgn@waikato.ac.nz Ian H. Witten

More information

Practical Fixed Length Lempel Ziv Coding

Practical Fixed Length Lempel Ziv Coding Practical Fixed Length Lempel Ziv Coding Shmuel T. Klein 1 and Dana Shapira 2 1 Dept. of Computer Science, Bar Ilan University, Ramat Gan 52900, Israel tomi@cs.biu.ac.il 2 Dept. of Computer Science, Ashkelon

More information

Design and Implementation of a Data Compression Scheme: A Partial Matching Approach

Design and Implementation of a Data Compression Scheme: A Partial Matching Approach Design and Implementation of a Data Compression Scheme: A Partial Matching Approach F. Choong, M. B. I. Reaz, T. C. Chin, F. Mohd-Yasin Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Selangor,

More information

Category: Informational December 1998

Category: Informational December 1998 Network Working Group R. Pereira Request for Comments: 2394 TimeStep Corporation Category: Informational December 1998 Status of this Memo IP Payload Compression Using DEFLATE This memo provides information

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)

More information

Practical Fixed Length Lempel Ziv Coding

Practical Fixed Length Lempel Ziv Coding Practical Fixed Length Lempel Ziv Coding Shmuel T. Klein a, Dana Shapira b a Department of Computer Science, Bar Ilan University, Ramat Gan 52900, Israel tomi@cs.biu.ac.il b Dept. of Computer Science,

More information

Dictionary techniques

Dictionary techniques Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.

More information

Gipfeli - High Speed Compression Algorithm

Gipfeli - High Speed Compression Algorithm Gipfeli - High Speed Compression Algorithm Rastislav Lenhardt I, II and Jyrki Alakuijala II I University of Oxford United Kingdom rastislav.lenhardt@cs.ox.ac.uk II Google Switzerland GmbH jyrki@google.com

More information

AAL 217: DATA STRUCTURES

AAL 217: DATA STRUCTURES Chapter # 4: Hashing AAL 217: DATA STRUCTURES The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions, and finds in constant average

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

CTW in Dasher: Summary and results.

CTW in Dasher: Summary and results. CTW in Dasher: Summary and results. After finishing my graduation thesis Using CTW as a language modeler in Dasher, I have visited the Inference group of the Physics department of the University of Cambridge,

More information

Basic Compression Library

Basic Compression Library Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library

More information

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties:

LZ UTF8. LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties: LZ UTF8 LZ UTF8 is a practical text compression library and stream format designed with the following objectives and properties: 1. Compress UTF 8 and 7 bit ASCII strings only. No support for arbitrary

More information

5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing

5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing 1. Introduction 2. Cutting and Packing Problems 3. Optimisation Techniques 4. Automated Packing Techniques 5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing 6.

More information

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints: CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,

More information

Image Compression - An Overview Jagroop Singh 1

Image Compression - An Overview Jagroop Singh 1 www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issues 8 Aug 2016, Page No. 17535-17539 Image Compression - An Overview Jagroop Singh 1 1 Faculty DAV Institute

More information

DEFLATE COMPRESSION ALGORITHM

DEFLATE COMPRESSION ALGORITHM DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract

More information

Lempel-Ziv-Welch (LZW) Compression Algorithm

Lempel-Ziv-Welch (LZW) Compression Algorithm Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static

More information

LIPT-Derived Transform Methods Used in Lossless Compression of Text Files

LIPT-Derived Transform Methods Used in Lossless Compression of Text Files ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY Volume 14, Number 2, 2011, 149 158 LIPT-Derived Transform Methods Used in Lossless Compression of Text Files Radu RĂDESCU Politehnica University of

More information

PREDICTIVE CODING WITH NEURAL NETS: APPLICATION TO TEXT COMPRESSION

PREDICTIVE CODING WITH NEURAL NETS: APPLICATION TO TEXT COMPRESSION PREDICTIVE CODING WITH NEURAL NETS: APPLICATION TO TEXT COMPRESSION J iirgen Schmidhuber Fakultat fiir Informatik Technische Universitat Miinchen 80290 Miinchen, Germany Stefan Heil Abstract To compress

More information

Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression

Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression Program Construction and Data Structures Course 1DL201 at Uppsala University Autumn 2010 / Spring 2011 Homework 6: Data Compression Prepared by Pierre Flener Lab: Thursday 17 February 2011 Submission Deadline:

More information

CHAPTER II LITERATURE REVIEW

CHAPTER II LITERATURE REVIEW CHAPTER II LITERATURE REVIEW 2.1 BACKGROUND OF THE STUDY The purpose of this chapter is to study and analyze famous lossless data compression algorithm, called LZW. The main objective of the study is to

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding

More information

A Quality of Service Decision Model for ATM-LAN/MAN Interconnection

A Quality of Service Decision Model for ATM-LAN/MAN Interconnection A Quality of Service Decision for ATM-LAN/MAN Interconnection N. Davies, P. Francis-Cobley Department of Computer Science, University of Bristol Introduction With ATM networks now coming of age, there

More information

On Additional Constrains in Lossless Compression of Text Files

On Additional Constrains in Lossless Compression of Text Files ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY Volume 18, Number 4, 2015, 299 311 On Additional Constrains in Lossless Compression of Text Files Radu RĂDESCU Politehnica University of Bucharest,

More information

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.

More information

The PackBits program on the Macintosh used a generalized RLE scheme for data compression.

The PackBits program on the Macintosh used a generalized RLE scheme for data compression. Tidbits on Image Compression (Above, Lena, unwitting data compression spokeswoman) In CS203 you probably saw how to create Huffman codes with greedy algorithms. Let s examine some other methods of compressing

More information

T. Bell and K. Pawlikowski University of Canterbury Christchurch, New Zealand

T. Bell and K. Pawlikowski University of Canterbury Christchurch, New Zealand The effect of data compression on packet sizes in data communication systems T. Bell and K. Pawlikowski University of Canterbury Christchurch, New Zealand Abstract.?????????? 1. INTRODUCTION Measurements

More information

Adaptive Compression of Graph Structured Text

Adaptive Compression of Graph Structured Text Adaptive Compression of Graph Structured Text John Gilbert and David M Abrahamson Department of Computer Science Trinity College Dublin {gilberj, david.abrahamson}@cs.tcd.ie Abstract In this paper we introduce

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression Yossi Matias Nasir Rajpoot Süleyman Cenk Ṣahinalp Abstract We report on the performance evaluation of greedy parsing with a

More information

Engineering Mathematics II Lecture 16 Compression

Engineering Mathematics II Lecture 16 Compression 010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

V.2 Index Compression

V.2 Index Compression V.2 Index Compression Heap s law (empirically observed and postulated): Size of the vocabulary (distinct terms) in a corpus E[ distinct terms in corpus] n with total number of term occurrences n, and constants,

More information

Improving LZW Image Compression

Improving LZW Image Compression European Journal of Scientific Research ISSN 1450-216X Vol.44 No.3 (2010), pp.502-509 EuroJournals Publishing, Inc. 2010 http://www.eurojournals.com/ejsr.htm Improving LZW Image Compression Sawsan A. Abu

More information

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms Università degli Studi di Palermo Facoltà Di Scienze Matematiche Fisiche E Naturali Tesi Di Laurea In Scienze Dell Informazione Optimal Parsing In Dictionary-Symbolwise Compression Algorithms Il candidato

More information

Study of LZ77 and LZ78 Data Compression Techniques

Study of LZ77 and LZ78 Data Compression Techniques Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression

More information

Text Compression. General remarks and Huffman coding Adobe pages Arithmetic coding Adobe pages 15 25

Text Compression. General remarks and Huffman coding Adobe pages Arithmetic coding Adobe pages 15 25 Text Compression General remarks and Huffman coding Adobe pages 2 14 Arithmetic coding Adobe pages 15 25 Dictionary coding and the LZW family Adobe pages 26 46 Performance considerations Adobe pages 47

More information

Lossless Image Compression having Compression Ratio Higher than JPEG

Lossless Image Compression having Compression Ratio Higher than JPEG Cloud Computing & Big Data 35 Lossless Image Compression having Compression Ratio Higher than JPEG Madan Singh madan.phdce@gmail.com, Vishal Chaudhary Computer Science and Engineering, Jaipur National

More information

A novel lossless data compression scheme based on the error correcting Hamming codes

A novel lossless data compression scheme based on the error correcting Hamming codes Computers and Mathematics with Applications 56 (2008) 143 150 www.elsevier.com/locate/camwa A novel lossless data compression scheme based on the error correcting Hamming codes Hussein Al-Bahadili Department

More information

Code Compression for RISC Processors with Variable Length Instruction Encoding

Code Compression for RISC Processors with Variable Length Instruction Encoding Code Compression for RISC Processors with Variable Length Instruction Encoding S. S. Gupta, D. Das, S.K. Panda, R. Kumar and P. P. Chakrabarty Department of Computer Science & Engineering Indian Institute

More information

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY LOSSLESS METHOD OF IMAGE COMPRESSION USING HUFFMAN CODING TECHNIQUES Trupti S Bobade *, Anushri S. sastikar 1 Department of electronics

More information

Compression of Concatenated Web Pages Using XBW

Compression of Concatenated Web Pages Using XBW Compression of Concatenated Web Pages Using XBW Radovan Šesták and Jan Lánský Charles University, Faculty of Mathematics and Physics, Department of Software Engineering Malostranské nám. 25, 118 00 Praha

More information

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES

AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES AN ANALYTICAL STUDY OF LOSSY COMPRESSION TECHINIQUES ON CONTINUOUS TONE GRAPHICAL IMAGES Dr.S.Narayanan Computer Centre, Alagappa University, Karaikudi-South (India) ABSTRACT The programs using complex

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year Image compression Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Methods for Image Processing academic year 2017 2018 Data and information The representation of images in a raw

More information

CSE 454. Index Compression Alta Vista PageRank

CSE 454. Index Compression Alta Vista PageRank CSE 454 Index Compression Alta Vista PageRank 1 Review t 1 d i q Vector Space Representation Dot Product as Similarity Metric d j t 2 TF-IDF for Computing Weights w ij = f(i,j) * log(n/n i ) Where q =

More information

Information Retrieval. Chap 7. Text Operations

Information Retrieval. Chap 7. Text Operations Information Retrieval Chap 7. Text Operations The Retrieval Process user need User Interface 4, 10 Text Text logical view Text Operations logical view 6, 7 user feedback Query Operations query Indexing

More information

Data Compression. Guest lecture, SGDS Fall 2011

Data Compression. Guest lecture, SGDS Fall 2011 Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns

More information

A Research Paper on Lossless Data Compression Techniques

A Research Paper on Lossless Data Compression Techniques IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 1 June 2017 ISSN (online): 2349-6010 A Research Paper on Lossless Data Compression Techniques Prof. Dipti Mathpal

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING Asian Journal Of Computer Science And Information Technology 2: 5 (2012) 114 118. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

Optimized Compression and Decompression Software

Optimized Compression and Decompression Software 2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Optimized Compression and Decompression Software Mohd Shafaat Hussain, Manoj Yadav

More information

Distributed source coding

Distributed source coding Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Enhanced LZW (Lempel-Ziv-Welch) Algorithm by Binary Search with

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college

More information

Dictionary-Based Fast Transform for Text Compression with High Compression Ratio

Dictionary-Based Fast Transform for Text Compression with High Compression Ratio Dictionary-Based Fast for Text Compression with High Compression Ratio Weifeng Sun Amar Mukherjee School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL. 32816

More information

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II) Chapter 5 VARIABLE-LENGTH CODING ---- Information Theory Results (II) 1 Some Fundamental Results Coding an Information Source Consider an information source, represented by a source alphabet S. S = { s,

More information

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Image Compression for Mobile Devices using Prediction and Direct Coding Approach Image Compression for Mobile Devices using Prediction and Direct Coding Approach Joshua Rajah Devadason M.E. scholar, CIT Coimbatore, India Mr. T. Ramraj Assistant Professor, CIT Coimbatore, India Abstract

More information

Dictionary Based Compression for Images

Dictionary Based Compression for Images Dictionary Based Compression for Images Bruno Carpentieri Abstract Lempel-Ziv methods were original introduced to compress one-dimensional data (text, object codes, etc.) but recently they have been successfully

More information

Unified VLSI Systolic Array Design for LZ Data Compression

Unified VLSI Systolic Array Design for LZ Data Compression Unified VLSI Systolic Array Design for LZ Data Compression Shih-Arn Hwang, and Cheng-Wen Wu Dept. of EE, NTHU, Taiwan, R.O.C. IEEE Trans. on VLSI Systems Vol. 9, No.4, Aug. 2001 Pages: 489-499 Presenter:

More information

Ordered Indices To gain fast random access to records in a file, we can use an index structure. Each index structure is associated with a particular search key. Just like index of a book, library catalog,

More information

Lossy Color Image Compression Based on Singular Value Decomposition and GNU GZIP

Lossy Color Image Compression Based on Singular Value Decomposition and GNU GZIP Lossy Color Image Compression Based on Singular Value Decomposition and GNU GZIP Jila-Ayubi 1, Mehdi-Rezaei 2 1 Department of Electrical engineering, Meraaj Institue, Salmas, Iran jila.ayubi@gmail.com

More information