Dictionary techniques

Size: px
Start display at page:

Download "Dictionary techniques"

Transcription

1 Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques. The basic idea is to exploit the symbol repetitions inside the source. Let us start with a very basic dictionary technique which was literally designed to compress the text dictionary entries. In this technique, the repeating clusters of letters in the front part of the previous word is represented by a number which shows the amount of repetition. The following table shows the English dictionary entries and their compressed (called front compression) counterparts: a aardvark aback abandon abandoning abandonment abasement abash abate abated abbot abbey abbreviating a 1ardvark 1back 3ndon 7ing 7ment 3sement 4h 3te 5d 2bot 3ey 3reviating Notice that the right column (compressed part) is somewhat shorter than the original dictionary entries. In general, we encode symbols which do not appear anywhere before as they are, but the symbols (or symbol sequences) which have occured before are only encoded by representing a pointer to the previous occurence:

2 Move-to-front coding The front coding scheme in the previous example shows that if we have a lot of repeating latters in the front part of two English words, it produces an efficient compression. The move-to-front coding algorithm (J. L. Bentley, 1986) tries to bring the more frequently occuring symbols to the front position in a list of symbols. The reason for changing the positions of the symbols in the list is ; the first symbols in the list have fewer bits than the last symbols to represent them. For this reason, we first have to form a list of binary representations which should satisfy the two conditions: 1. The first binary numbers should be shorter than the later ones. 2. The binary codes must be uniquely decodable. A commonly used binary list is as follows:

3 The above binary codewors are generated using a simple prefix technique. The red bins represent the prefix. If the number of bits in the prefix is N, then the number of bits that follow the prefix is N-1. Using N-1 bits, we can generate different binary numbers. Here, for instance, when the prefix is 001, N=3, so N-1=2 and we can generate 4 symbols which have the prefix of 001. Exercise: In the continuation of the above list, how many numbers are there that follows the prefix 00001? (We have shown only one of them in the above list). Check Answer Reset Using this method, we have obtained a suitable ordering of uniquely decodable binary numbers. The next stage is to use this list to encode our symbols. The moveto-front technique is an adaptive one which dynamically changes the binary representation of a symbol as the new symbols arrive in the source. We try to maintain our alphabet (or symbol list) as a list where frequently occuring symbols are located near the front (which have fewer bits). Exercise: Let us perform move-to-front coding to the following text: "the boy on my right is the right boy" We will consider that the symbols are the words. Step-by-step, this is what happens: Initially, the list is empty. Counter is 0. First symbol is "the". It does not exist in the list, so we emit the code : "0the". It comes directly to the front of the list, and the list becomes {0:the}. Counter is 1. Second symbol is "boy". It does not exist in the list, so we emit the code : "1boy". Since it occured later than "the", it gets inserted to the top of the list (it moves to the front). The list is now {0:boy, 1:the}. Counter is 2. Next symbol is "on". It does not exist in the list, so we emit the code : "2on". Since it occured later than "boy", it gets inserted to the top of the list (it moves to the front). The list is now {0:on, 1:boy, 2:the}. Counter is 3. Next symbol is "my". It does not exist in the list, so we emit the code : "3my". Since it occured later than "on", it gets inserted to the top of the list (it moves to the front). The list is now {0:my, 1:on, 2:boy, 3:the}. Counter is 4. Next symbol is "right". It does not exist in the list, so we emit the code : "4right". Since it occured later than "my", it gets inserted to the top of the list (it moves to the front). The list is now {0:right, 1:my, 2:on, 3:boy,

4 4:the}. Counter is 5. Next symbol is "is". It does not exist in the list, so we emit the code : "5is". Since it occured later than "my", it gets inserted to the top of the list (it moves to the front). The list is now {0:is, 1:right, 2:my, 3:on, 4:boy, 5:the}. Counter is 6. Next symbol is "the". Now, this symbol exists in the list, and its rank is 5. So we emit the code : "5". The occurence of "the" became more frequent than the others, so we have to move "the" symbol to the front and the list becomes: {0:the, 1:is, 2:right, 3:my, 4:on, 5:boy}. Counter is 7. Next symbol is "right". This symbol exists in the list, and its rank is 2. So we emit the code : "2". The occurence of "right" became more frequent than the others (actually same as "the", but "right" came later, so it has the priority), so we have to move "right" symbol to the front and the list becomes: {0:right, 1:the, 2:is, 3:my, 4:on, 5:boy}. Counter is 8. Next symbol is "boy". This symbol exists in the list, and its rank is 5. So we emit the code : "5". The occurence of "boy" became more frequent than the others (actually same as "the" and "right", but "boy" came later, so it has the priority), so we have to move "boy" symbol to the front and the list becomes: {0:boy, 1:right, 2:the, 3:is, 4:my, 5:on}. The overall compressed data is: {0the 1boy 2on 3my 4right 5is 5 2 5}. In this way, we not only expressed the repeating words with simple numbers, but also tried to use smaller numbers for them, as well. The efficiency of this method becomes more clear when longer sources are used. Notice that the codebook that we use is time varying. This is a common property in most of the dictionary based compression techniques. Lempel-Ziv data compression A famous compression algorithm is named after Lempel and Ziv had developed (1977 and 1978) their successful dictionary technique. The first implementation is called LZ77 and the second one is called LZ78. Strangely, LZ78 is a simpler algorithm, therefore it has been used first (still used in the UNIX compression and zip). With the improvement of the computers, the implementations of LZ77 became feasible, and it is still used in Windows based compression utilities. These techniques are different from other basic techniques in the following way: The encoded symbol amount and the bits per encoded symbol continuously change while compression (time varying). There is no a-priori knowledge about the probabilities (or other statistics) of the input source. The system is totally adaptive. The adaptation is in such a way that the average code length per symbol is minimized as time evolves. Note: this behavior is called "universal coding". They are really commonly used. The general Lempel-Ziv algorithm parses the input stream into symbols that occur several times in the source. In this way, the repeating patterns become more

5 efficient than the basic method as illustrated in the top figure. As an example, the following parsing and codebook generation illustrates an LZ(78) coder: The algorithm searches the window for the longest match from the beginning of the lookahead buffer and outputs a pointer to that match. Since it is possible that not even a one-symbol match can be found, the output should not contain just pointers. LZ77 solves this problem the following way: after each pointer, it outputs the first symbol in the lookahead buffer after the match. If there is no match, it LZ77: The algorithm encodes a sequence of length N which has been generated using M distict symbols. In order to describe the algorithm, let us make the following definitions: Input stream: the sequence of symbols to be compressed Symbol: the basic data element in the input stream; Coding position: the position of the symbol in the input stream that is currently being coded (the beginning of the lookahead buffer); Lookahead buffer: the symbol sequence from the coding position to the end of the input stream; The Window of size W contains W characters from the coding position backwards, i.e. the last W processed symbols; P: A pointer which points to the match in the window and also specifies its length. We will try to encode a sub-sequence in the input stream by trying to locate the same sequence somewhere else in the input stream. The location of the same sequence will correspond to P (pointer), and the size of the sub-sequence is W (window).

6 outputs a null-pointer and then outputs the symbol at the coding position. We can summarize this with the following encoding algorithm: 1. Set the coding position to the beginning of the input stream; 2. find the longest match in the window for the lookahead buffer; 3. output the pair (P,S) with the following meaning: P is the pointer to the match in the window; S is the first symbol in the lookahead buffer that didn't match; 4. if the lookahead buffer is not empty, move the coding position (and the window) L+1 symbols forward and return to step 2. Exercise: Encode the following input using LZ77: Position (P) Symbol (S) A A B C B B A B C The encoding is done step by step as given in the following table: Step Position Match Symbol Output A (0,0) A 2. 2 A B (1,1) B C (0,0) C 4. 5 B B (2,1) B 5. 7 A B C (5,2) C The behavior of the table can be described as follows: "Step" indicates the number of the encoding step. It completes each time the encoding wmits an output. With LZ77 this happens in each pass through the step 3 of the described algorithm. "Position" indicates the coding position. The first character in the input stream has the coding position 1. " Match" shows the longest match found in the window. "Symbol" shows the first symbol in the lookahead buffer after the match. "Output" represents the emitted output in the format (B,L) S: (B,L) is the "beginning" and "length" information of the pointer (P) to the Match. This gives the following instruction to the decoder: "Go back B symbols in the window and copy L symbols to the output"; S is the isolated Symbol. Let us decode the emitted symbols (the last column of the above table): (0,0) A : Initially, output = {A} (1,1) B : Go back 1 symbol, copy 1 symbol to the output, then emit B, ouput now = {A A B} (0,0) C : Go back 0 symbols and copy 0 symbols (this makes nothing...), then emit C, output ={A A B C}

7 (2,1) B : Go back 2 symbols and copy 1 symbol, then emit B, output = {A A B C B B} (5,2) C : Go back 5 symbols and copy 2 symbols, then emit C, output = {A A B C B B A B C} Notice that the encoder requires extensive search of repeating characters to find the longest match. However, the decoder is very simple in terms of computational complexity. Although the compression was very efficient, this was a bad point for slow computers, therefore another algorithm (LZ78) was proposed. Exercise: Find the answers to the three questions according to the given input stream: ALITOPUALGEL Distance (B) of a match in the text window 5 The length (L) of the match phrase 1 The first symbol (S) in the look-ahead buffer that follows the phrase. Check Answer Click and watch the following flash animation (by Kemal Bayrakceken - in Turkish) illustrating LZ77 coding. In practice, the necessity of using three codewords for each emitted symbol is also a redundant situation. Let us see how LZ78 eliminates these problems: LZ78: Once again, we need to define and clarify some of the terminology that we use here: SymbolStream: a sequence of data to be encoded; Symbol: the basic data element in the SymbolStream; Prefix: a sequence of symbols that precede one symbol; String: the prefix together with the symbol it precedes; Code word: a basic data element in the codestream. It represents a string from the dictionary; Codestream: the sequence of code words and symbols (the output of the encoding algorithm); Dictionary: a table of strings. Every string is assigned a code word according to its index number in the dictionary; Current prefix: the prefix currently being processed in the encoding algorithm. Denote with: P Current symbol: a symbol determined in the endocing algorithm. Generally this is the symbol preceded by the current prefix. Denote with: C Current code word: the code word currently processed in the decoding algorithm. Denoted by the string W. ":=" means "assignment". Using these definitions, we can, now, list the encoding algorithm: G

8 1. Initially, the dictionary and P are empty; 2. S:= next symbol in the symbolstream; 3. Is the string P+S present in the dictionary? if it is, P := P+S (extend P with S); if not, i. output these two objects to the codestream: the code word corresponding to P (if P is empty, output a zero); S, in the same form as input from the symbolstream; ii. add the string P+S to the dictionary; iii. P := empty; 4. are there more symbols in the symbolstream? if yes, return to step 2; if not: i. if P is not empty, output the code word corresponding to P; ii. END. The algorithm steps may look cluttered and dificult to comprehend. Let us also try to explain the algorithm step by step: At the beginning of encoding the dictionary is empty. In order to explain the principle of encoding, let's consider a point within the encoding process, when the dictionary already contains some strings. We start analyzing a new prefix in the symbolstream, beginning with an empty prefix. If its corresponding string (prefix + the symbol after it : P+C) is present in the dictionary, the prefix is extended with the character C. This extending is repeated until we get a string which is not present in the dictionary. This is a very clever way of searching for the maximum window length that repeats itself. At the point where the extended prefix does not exist in the library, we emit two outputs to the codestream: the code word that represents the prefix P, and then the symbol S. Then we add the final whole string (P+S) to the dictionary and start processing the next prefix in the symbolstream. Implementation note: A special case occurs if the dictionary doesn't contain a single symbol, even the starting one (for example, this always happens in the first encoding step). In that case we output a special code word that represents an empty string, followed by this character and add this character to the dictionary. The output from this algorithm is a sequence of codeword-symbol pairs (W,S). Each time a pair is emitted to the codestream, the string from the dictionary corresponding to W is extended with the symbol S and the resulting string is added to the dictionary. Notice that when a new string is added to the dictionary, the dictionary already contains all the substrings formed by removing characters from the end of the new string. For example, if ABBACB is added, then the dictionary should already contain the codewords: ABBAC, ABBA, ABB, AB, and A. Exercise: Encode the following input using LZ78: Position (P)

9 Symbol (S) A B B C B C A B A The encoding is done step by step as given in the following table: Step Position Dictionary addition Output 1. 1 A (0,A) 2. 2 B (0,B) 3. 3 BC (2,C) 4. 5 BCA (3,A) 5. 8 BA (2,A) Let us describe the example operation: The column Step indicates the number of the encoding step. Each encoding step is completed when the step 3.b. in the encoding algorithm is executed. The column Position indicates the current position in the input data. The column Dictionary added shows what string has been added to the dictionary. The index of the string is equal to the step number. The column Output presents the output in the form (W,C). The output of each step decodes to the string that has been added to the dictionary. The decoding is, again, quite simple. Decode (0,A), emit A, the dictionary is {A} Decode (0,A), emit B, the dictionary is {A, B} Decode (2,C), the second in the dictionary is B, so emit BC. the dictionary is {A, B, BC} Decode (3,A), the third in the dictionary is BC, so emit BCA. the dictionary is {A, B, BC, BCA} Decode (2,A), the second in the dictionary is B, so emit BA. the dictionary is {A, B, BC, BCA, BA} The overall emitted symbols are ABBCBCABA. Important note: The emitted symbols during encoding are not necessarily represented like (0,A), or (2,C), etc. The use of parantheses greatly reduces efficiency. Usually, the parantheses are omitted. However, the prefix and symbol (for example, in (2,C), 2 is the prefix location and C is the symbol) must be separated during decoding. Otherwise how could you resolve 0A0B2C3A2A? The numbers like 2 and 3 could as well be the symbols, themselves. In practice, the compressed data consists of the following syntax: AB*2C*3A*2A This eliminates the redundancy of parantheses and the redundancy of indicating an index for the symbols that do not exist in the dictionary (previously, indicated by something like (0,A), etc.) The symbol * followed by a number indicates the

10 dictionary location. Of course, we assume that the symbol * does not exist in our source. If, exceptionally, the * symbol occurs, then i is easily overcome by putting an escape sequence before *, i.e., we say **. This way, ** is decodable, because, since * is not followed by a number, the decoder can decide that we mean to represent the symbol of *. Click and watch the following flash animation (by Ahmet Gurbuz) illustrating LZ78 coding. The matlab script lz78.m interactively encodes an entered string using LZ78 algorithm (by Serhan Yavuz, requires ispresent.m). Concluding remarks: 1. There are variations and improements over the classical LZ77 and LZ78 algorithms. LZSS and LZW are, perhaps, the most popular ones. Indeed, LZW is really one of the dominant compression algorithms, and it is used in really many commercially available programs such as WinZip, PKzip, etc. 2. Although we have considered the lossless compression schemes (huffman, arithmetic, RLE, LZ, etc.) individually, you should not forget that they can also be used inside a lossy compression algorithms. Remember that we have covered scalar and vector quantization in the previous chapter. They were the steps that incorporated the loss into the coder. The result of quantization was a list of symbols from codebook entries. You should always keep in mind that the codebook can be considered as your alphabet, and the symbol list generated by the quantizer can be considered as a symbol stream which can be compressed using the described lossless coders. The students are urged to remember the overall block diagram of a typical signal coding algorithm. Check out a very nice JAVA applet which compresses the entered string using LZW algorithm online. You can go to the original page of the above applet here. Available links: You can find, literally, zillions of information on the LZ compression algorithms. Here are a few: 1. Lempel-Ziv compression algorithms 2. Interactive LZW compression 3. Lempel-Ziv-Welch Compression (LZW) 4. Lempel-Ziv compression of a file 5. Lempel-Ziv file compression

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

Chapter 7 Lossless Compression Algorithms

Chapter 7 Lossless Compression Algorithms Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7

More information

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

Lempel-Ziv-Welch (LZW) Compression Algorithm

Lempel-Ziv-Welch (LZW) Compression Algorithm Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static

More information

Simple variant of coding with a variable number of symbols and fixlength codewords.

Simple variant of coding with a variable number of symbols and fixlength codewords. Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source

More information

Engineering Mathematics II Lecture 16 Compression

Engineering Mathematics II Lecture 16 Compression 010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &

More information

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1 Data Compression Decreases

More information

Text Compression. General remarks and Huffman coding Adobe pages Arithmetic coding Adobe pages 15 25

Text Compression. General remarks and Huffman coding Adobe pages Arithmetic coding Adobe pages 15 25 Text Compression General remarks and Huffman coding Adobe pages 2 14 Arithmetic coding Adobe pages 15 25 Dictionary coding and the LZW family Adobe pages 26 46 Performance considerations Adobe pages 47

More information

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Source Coding (Part 1) Hello everyone, we will start a new module today

More information

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression

More information

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods R. Nigel Horspool Dept. of Computer Science, University of Victoria P. O. Box 3055, Victoria, B.C., Canada V8W 3P6 E-mail address: nigelh@csr.uvic.ca

More information

Intro. To Multimedia Engineering Lossless Compression

Intro. To Multimedia Engineering Lossless Compression Intro. To Multimedia Engineering Lossless Compression Kyoungro Yoon yoonk@konkuk.ac.kr 1/43 Contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Dictionary-based

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

A Comprehensive Review of Data Compression Techniques

A Comprehensive Review of Data Compression Techniques Volume-6, Issue-2, March-April 2016 International Journal of Engineering and Management Research Page Number: 684-688 A Comprehensive Review of Data Compression Techniques Palwinder Singh 1, Amarbir Singh

More information

Abdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013

Abdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013 Abdullah-Al Mamun CSE 5095 Yufeng Wu Spring 2013 Introduction Data compression is the art of reducing the number of bits needed to store or transmit data Compression is closely related to decompression

More information

Data Compression 신찬수

Data Compression 신찬수 Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory

More information

A study in compression algorithms

A study in compression algorithms Master Thesis Computer Science Thesis no: MCS-004:7 January 005 A study in compression algorithms Mattias Håkansson Sjöstrand Department of Interaction and System Design School of Engineering Blekinge

More information

Study of LZ77 and LZ78 Data Compression Techniques

Study of LZ77 and LZ78 Data Compression Techniques Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information

More information

FPGA based Data Compression using Dictionary based LZW Algorithm

FPGA based Data Compression using Dictionary based LZW Algorithm FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information

More information

Multimedia Systems. Part 20. Mahdi Vasighi

Multimedia Systems. Part 20. Mahdi Vasighi Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding

More information

VIDEO SIGNALS. Lossless coding

VIDEO SIGNALS. Lossless coding VIDEO SIGNALS Lossless coding LOSSLESS CODING The goal of lossless image compression is to represent an image signal with the smallest possible number of bits without loss of any information, thereby speeding

More information

Comparative Study of Dictionary based Compression Algorithms on Text Data

Comparative Study of Dictionary based Compression Algorithms on Text Data 88 Comparative Study of Dictionary based Compression Algorithms on Text Data Amit Jain Kamaljit I. Lakhtaria Sir Padampat Singhania University, Udaipur (Raj.) 323601 India Abstract: With increasing amount

More information

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014 LZW Compression Ramana Kumar Kundella Indiana State University rkundella@sycamores.indstate.edu December 13, 2014 Abstract LZW is one of the well-known lossless compression methods. Since it has several

More information

Digital Image Processing

Digital Image Processing Lecture 9+10 Image Compression Lecturer: Ha Dai Duong Faculty of Information Technology 1. Introduction Image compression To Solve the problem of reduncing the amount of data required to represent a digital

More information

DEFLATE COMPRESSION ALGORITHM

DEFLATE COMPRESSION ALGORITHM DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract

More information

Data Compression. Guest lecture, SGDS Fall 2011

Data Compression. Guest lecture, SGDS Fall 2011 Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns

More information

Optimized Compression and Decompression Software

Optimized Compression and Decompression Software 2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Optimized Compression and Decompression Software Mohd Shafaat Hussain, Manoj Yadav

More information

Compressing Data. Konstantin Tretyakov

Compressing Data. Konstantin Tretyakov Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical

More information

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman

More information

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION 15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,

More information

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year Image compression Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Methods for Image Processing academic year 2017 2018 Data and information The representation of images in a raw

More information

7: Image Compression

7: Image Compression 7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics

More information

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image

More information

CHAPTER II LITERATURE REVIEW

CHAPTER II LITERATURE REVIEW CHAPTER II LITERATURE REVIEW 2.1 BACKGROUND OF THE STUDY The purpose of this chapter is to study and analyze famous lossless data compression algorithm, called LZW. The main objective of the study is to

More information

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we have to take into account the complexity of the code.

More information

Analysis of Parallelization Effects on Textual Data Compression

Analysis of Parallelization Effects on Textual Data Compression Analysis of Parallelization Effects on Textual Data GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000

More information

Distributed source coding

Distributed source coding Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >

More information

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression

More information

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints: CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,

More information

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in Lossless Data Compression for Security Purposes Using Huffman Encoding A thesis submitted to the Graduate School of University of Cincinnati in a partial fulfillment of requirements for the degree of Master

More information

I. Introduction II. Mathematical Context

I. Introduction II. Mathematical Context Data Compression Lucas Garron: August 4, 2005 I. Introduction In the modern era known as the Information Age, forms of electronic information are steadily becoming more important. Unfortunately, maintenance

More information

CIS 121 Data Structures and Algorithms with Java Spring 2018

CIS 121 Data Structures and Algorithms with Java Spring 2018 CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and

More information

A New Compression Method Strictly for English Textual Data

A New Compression Method Strictly for English Textual Data A New Compression Method Strictly for English Textual Data Sabina Priyadarshini Department of Computer Science and Engineering Birla Institute of Technology Abstract - Data compression is a requirement

More information

Lecture 6 Review of Lossless Coding (II)

Lecture 6 Review of Lossless Coding (II) Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 6 Review of Lossless Coding (II) May 28, 2009 Outline Review Manual exercises on arithmetic coding and LZW dictionary coding 1 Review Lossy coding

More information

Text Compression through Huffman Coding. Terminology

Text Compression through Huffman Coding. Terminology Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character

More information

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3

Category: Informational May DEFLATE Compressed Data Format Specification version 1.3 Network Working Group P. Deutsch Request for Comments: 1951 Aladdin Enterprises Category: Informational May 1996 DEFLATE Compressed Data Format Specification version 1.3 Status of This Memo This memo provides

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits

More information

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University

More information

EE67I Multimedia Communication Systems Lecture 4

EE67I Multimedia Communication Systems Lecture 4 EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.

More information

Error Resilient LZ 77 Data Compression

Error Resilient LZ 77 Data Compression Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single

More information

Chapter 1. Digital Data Representation and Communication. Part 2

Chapter 1. Digital Data Representation and Communication. Part 2 Chapter 1. Digital Data Representation and Communication Part 2 Compression Digital media files are usually very large, and they need to be made smaller compressed Without compression Won t have storage

More information

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM Parekar P. M. 1, Thakare S. S. 2 1,2 Department of Electronics and Telecommunication Engineering, Amravati University Government College

More information

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Shmuel T. Klein 1, Tamar C. Serebro 1, and Dana Shapira 2 1 Department of Computer Science Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il, t lender@hotmail.com 2 Department of Computer Science

More information

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?

More information

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014

More information

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college

More information

Basic Compression Library

Basic Compression Library Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library

More information

A Comparative Study Of Text Compression Algorithms

A Comparative Study Of Text Compression Algorithms International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Modeling Delta Encoding of Compressed Files EXTENDED ABSTRACT S.T. Klein, T.C. Serebro, and D. Shapira 1 Dept of CS Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il 2 Dept of CS Bar Ilan University

More information

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY LOSSLESS METHOD OF IMAGE COMPRESSION USING HUFFMAN CODING TECHNIQUES Trupti S Bobade *, Anushri S. sastikar 1 Department of electronics

More information

You can say that again! Text compression

You can say that again! Text compression Activity 3 You can say that again! Text compression Age group Early elementary and up. Abilities assumed Copying written text. Time 10 minutes or more. Size of group From individuals to the whole class.

More information

A Method for Virtual Extension of LZW Compression Dictionary

A Method for Virtual Extension of LZW Compression Dictionary A Method for Virtual Extension of Compression Dictionary István Finta, Lóránt Farkas, Sándor Szénási and Szabolcs Sergyán Technology and Innovation, Nokia Networks, Köztelek utca 6, Budapest, Hungary Email:

More information

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201 Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding

More information

Image coding and compression

Image coding and compression Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding

More information

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING

OPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING Asian Journal Of Computer Science And Information Technology 2: 5 (2012) 114 118. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc JavaCC: LOOKAHEAD MiniTutorial 1. WHAT IS LOOKAHEAD The job of a parser is to read an input stream and determine whether or not the input stream conforms to the grammar. This determination in its most

More information

VC 12/13 T16 Video Compression

VC 12/13 T16 Video Compression VC 12/13 T16 Video Compression Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline The need for compression Types of redundancy

More information

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

IMAGE COMPRESSION TECHNIQUES

IMAGE COMPRESSION TECHNIQUES IMAGE COMPRESSION TECHNIQUES A.VASANTHAKUMARI, M.Sc., M.Phil., ASSISTANT PROFESSOR OF COMPUTER SCIENCE, JOSEPH ARTS AND SCIENCE COLLEGE, TIRUNAVALUR, VILLUPURAM (DT), TAMIL NADU, INDIA ABSTRACT A picture

More information

Optimization of Bit Rate in Medical Image Compression

Optimization of Bit Rate in Medical Image Compression Optimization of Bit Rate in Medical Image Compression Dr.J.Subash Chandra Bose 1, Mrs.Yamini.J 2, P.Pushparaj 3, P.Naveenkumar 4, Arunkumar.M 5, J.Vinothkumar 6 Professor and Head, Department of CSE, Professional

More information

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1 IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,

More information

Textual Data Compression Speedup by Parallelization

Textual Data Compression Speedup by Parallelization Textual Data Compression Speedup by Parallelization GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000

More information

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms Università degli Studi di Palermo Facoltà Di Scienze Matematiche Fisiche E Naturali Tesi Di Laurea In Scienze Dell Informazione Optimal Parsing In Dictionary-Symbolwise Compression Algorithms Il candidato

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,

More information

So, what is data compression, and why do we need it?

So, what is data compression, and why do we need it? In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived

More information

Data Compression Fundamentals

Data Compression Fundamentals 1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch 2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression»

More information

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor 2016 International Conference on Information Technology Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor Vasanthi D R and Anusha R M.Tech (VLSI Design

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Enhanced LZW (Lempel-Ziv-Welch) Algorithm by Binary Search with

More information

6. Finding Efficient Compressions; Huffman and Hu-Tucker

6. Finding Efficient Compressions; Huffman and Hu-Tucker 6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?

More information

Formal Languages and Compilers Lecture VI: Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal

More information

More Bits and Bytes Huffman Coding

More Bits and Bytes Huffman Coding More Bits and Bytes Huffman Coding Encoding Text: How is it done? ASCII, UTF, Huffman algorithm ASCII C A T Lawrence Snyder, CSE UTF-8: All the alphabets in the world Uniform Transformation Format: a variable-width

More information

15 July, Huffman Trees. Heaps

15 July, Huffman Trees. Heaps 1 Huffman Trees The Huffman Code: Huffman algorithm uses a binary tree to compress data. It is called the Huffman code, after David Huffman who discovered d it in 1952. Data compression is important in

More information