Dictionary techniques
|
|
- Jerome Tate
- 5 years ago
- Views:
Transcription
1 Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques. The basic idea is to exploit the symbol repetitions inside the source. Let us start with a very basic dictionary technique which was literally designed to compress the text dictionary entries. In this technique, the repeating clusters of letters in the front part of the previous word is represented by a number which shows the amount of repetition. The following table shows the English dictionary entries and their compressed (called front compression) counterparts: a aardvark aback abandon abandoning abandonment abasement abash abate abated abbot abbey abbreviating a 1ardvark 1back 3ndon 7ing 7ment 3sement 4h 3te 5d 2bot 3ey 3reviating Notice that the right column (compressed part) is somewhat shorter than the original dictionary entries. In general, we encode symbols which do not appear anywhere before as they are, but the symbols (or symbol sequences) which have occured before are only encoded by representing a pointer to the previous occurence:
2 Move-to-front coding The front coding scheme in the previous example shows that if we have a lot of repeating latters in the front part of two English words, it produces an efficient compression. The move-to-front coding algorithm (J. L. Bentley, 1986) tries to bring the more frequently occuring symbols to the front position in a list of symbols. The reason for changing the positions of the symbols in the list is ; the first symbols in the list have fewer bits than the last symbols to represent them. For this reason, we first have to form a list of binary representations which should satisfy the two conditions: 1. The first binary numbers should be shorter than the later ones. 2. The binary codes must be uniquely decodable. A commonly used binary list is as follows:
3 The above binary codewors are generated using a simple prefix technique. The red bins represent the prefix. If the number of bits in the prefix is N, then the number of bits that follow the prefix is N-1. Using N-1 bits, we can generate different binary numbers. Here, for instance, when the prefix is 001, N=3, so N-1=2 and we can generate 4 symbols which have the prefix of 001. Exercise: In the continuation of the above list, how many numbers are there that follows the prefix 00001? (We have shown only one of them in the above list). Check Answer Reset Using this method, we have obtained a suitable ordering of uniquely decodable binary numbers. The next stage is to use this list to encode our symbols. The moveto-front technique is an adaptive one which dynamically changes the binary representation of a symbol as the new symbols arrive in the source. We try to maintain our alphabet (or symbol list) as a list where frequently occuring symbols are located near the front (which have fewer bits). Exercise: Let us perform move-to-front coding to the following text: "the boy on my right is the right boy" We will consider that the symbols are the words. Step-by-step, this is what happens: Initially, the list is empty. Counter is 0. First symbol is "the". It does not exist in the list, so we emit the code : "0the". It comes directly to the front of the list, and the list becomes {0:the}. Counter is 1. Second symbol is "boy". It does not exist in the list, so we emit the code : "1boy". Since it occured later than "the", it gets inserted to the top of the list (it moves to the front). The list is now {0:boy, 1:the}. Counter is 2. Next symbol is "on". It does not exist in the list, so we emit the code : "2on". Since it occured later than "boy", it gets inserted to the top of the list (it moves to the front). The list is now {0:on, 1:boy, 2:the}. Counter is 3. Next symbol is "my". It does not exist in the list, so we emit the code : "3my". Since it occured later than "on", it gets inserted to the top of the list (it moves to the front). The list is now {0:my, 1:on, 2:boy, 3:the}. Counter is 4. Next symbol is "right". It does not exist in the list, so we emit the code : "4right". Since it occured later than "my", it gets inserted to the top of the list (it moves to the front). The list is now {0:right, 1:my, 2:on, 3:boy,
4 4:the}. Counter is 5. Next symbol is "is". It does not exist in the list, so we emit the code : "5is". Since it occured later than "my", it gets inserted to the top of the list (it moves to the front). The list is now {0:is, 1:right, 2:my, 3:on, 4:boy, 5:the}. Counter is 6. Next symbol is "the". Now, this symbol exists in the list, and its rank is 5. So we emit the code : "5". The occurence of "the" became more frequent than the others, so we have to move "the" symbol to the front and the list becomes: {0:the, 1:is, 2:right, 3:my, 4:on, 5:boy}. Counter is 7. Next symbol is "right". This symbol exists in the list, and its rank is 2. So we emit the code : "2". The occurence of "right" became more frequent than the others (actually same as "the", but "right" came later, so it has the priority), so we have to move "right" symbol to the front and the list becomes: {0:right, 1:the, 2:is, 3:my, 4:on, 5:boy}. Counter is 8. Next symbol is "boy". This symbol exists in the list, and its rank is 5. So we emit the code : "5". The occurence of "boy" became more frequent than the others (actually same as "the" and "right", but "boy" came later, so it has the priority), so we have to move "boy" symbol to the front and the list becomes: {0:boy, 1:right, 2:the, 3:is, 4:my, 5:on}. The overall compressed data is: {0the 1boy 2on 3my 4right 5is 5 2 5}. In this way, we not only expressed the repeating words with simple numbers, but also tried to use smaller numbers for them, as well. The efficiency of this method becomes more clear when longer sources are used. Notice that the codebook that we use is time varying. This is a common property in most of the dictionary based compression techniques. Lempel-Ziv data compression A famous compression algorithm is named after Lempel and Ziv had developed (1977 and 1978) their successful dictionary technique. The first implementation is called LZ77 and the second one is called LZ78. Strangely, LZ78 is a simpler algorithm, therefore it has been used first (still used in the UNIX compression and zip). With the improvement of the computers, the implementations of LZ77 became feasible, and it is still used in Windows based compression utilities. These techniques are different from other basic techniques in the following way: The encoded symbol amount and the bits per encoded symbol continuously change while compression (time varying). There is no a-priori knowledge about the probabilities (or other statistics) of the input source. The system is totally adaptive. The adaptation is in such a way that the average code length per symbol is minimized as time evolves. Note: this behavior is called "universal coding". They are really commonly used. The general Lempel-Ziv algorithm parses the input stream into symbols that occur several times in the source. In this way, the repeating patterns become more
5 efficient than the basic method as illustrated in the top figure. As an example, the following parsing and codebook generation illustrates an LZ(78) coder: The algorithm searches the window for the longest match from the beginning of the lookahead buffer and outputs a pointer to that match. Since it is possible that not even a one-symbol match can be found, the output should not contain just pointers. LZ77 solves this problem the following way: after each pointer, it outputs the first symbol in the lookahead buffer after the match. If there is no match, it LZ77: The algorithm encodes a sequence of length N which has been generated using M distict symbols. In order to describe the algorithm, let us make the following definitions: Input stream: the sequence of symbols to be compressed Symbol: the basic data element in the input stream; Coding position: the position of the symbol in the input stream that is currently being coded (the beginning of the lookahead buffer); Lookahead buffer: the symbol sequence from the coding position to the end of the input stream; The Window of size W contains W characters from the coding position backwards, i.e. the last W processed symbols; P: A pointer which points to the match in the window and also specifies its length. We will try to encode a sub-sequence in the input stream by trying to locate the same sequence somewhere else in the input stream. The location of the same sequence will correspond to P (pointer), and the size of the sub-sequence is W (window).
6 outputs a null-pointer and then outputs the symbol at the coding position. We can summarize this with the following encoding algorithm: 1. Set the coding position to the beginning of the input stream; 2. find the longest match in the window for the lookahead buffer; 3. output the pair (P,S) with the following meaning: P is the pointer to the match in the window; S is the first symbol in the lookahead buffer that didn't match; 4. if the lookahead buffer is not empty, move the coding position (and the window) L+1 symbols forward and return to step 2. Exercise: Encode the following input using LZ77: Position (P) Symbol (S) A A B C B B A B C The encoding is done step by step as given in the following table: Step Position Match Symbol Output A (0,0) A 2. 2 A B (1,1) B C (0,0) C 4. 5 B B (2,1) B 5. 7 A B C (5,2) C The behavior of the table can be described as follows: "Step" indicates the number of the encoding step. It completes each time the encoding wmits an output. With LZ77 this happens in each pass through the step 3 of the described algorithm. "Position" indicates the coding position. The first character in the input stream has the coding position 1. " Match" shows the longest match found in the window. "Symbol" shows the first symbol in the lookahead buffer after the match. "Output" represents the emitted output in the format (B,L) S: (B,L) is the "beginning" and "length" information of the pointer (P) to the Match. This gives the following instruction to the decoder: "Go back B symbols in the window and copy L symbols to the output"; S is the isolated Symbol. Let us decode the emitted symbols (the last column of the above table): (0,0) A : Initially, output = {A} (1,1) B : Go back 1 symbol, copy 1 symbol to the output, then emit B, ouput now = {A A B} (0,0) C : Go back 0 symbols and copy 0 symbols (this makes nothing...), then emit C, output ={A A B C}
7 (2,1) B : Go back 2 symbols and copy 1 symbol, then emit B, output = {A A B C B B} (5,2) C : Go back 5 symbols and copy 2 symbols, then emit C, output = {A A B C B B A B C} Notice that the encoder requires extensive search of repeating characters to find the longest match. However, the decoder is very simple in terms of computational complexity. Although the compression was very efficient, this was a bad point for slow computers, therefore another algorithm (LZ78) was proposed. Exercise: Find the answers to the three questions according to the given input stream: ALITOPUALGEL Distance (B) of a match in the text window 5 The length (L) of the match phrase 1 The first symbol (S) in the look-ahead buffer that follows the phrase. Check Answer Click and watch the following flash animation (by Kemal Bayrakceken - in Turkish) illustrating LZ77 coding. In practice, the necessity of using three codewords for each emitted symbol is also a redundant situation. Let us see how LZ78 eliminates these problems: LZ78: Once again, we need to define and clarify some of the terminology that we use here: SymbolStream: a sequence of data to be encoded; Symbol: the basic data element in the SymbolStream; Prefix: a sequence of symbols that precede one symbol; String: the prefix together with the symbol it precedes; Code word: a basic data element in the codestream. It represents a string from the dictionary; Codestream: the sequence of code words and symbols (the output of the encoding algorithm); Dictionary: a table of strings. Every string is assigned a code word according to its index number in the dictionary; Current prefix: the prefix currently being processed in the encoding algorithm. Denote with: P Current symbol: a symbol determined in the endocing algorithm. Generally this is the symbol preceded by the current prefix. Denote with: C Current code word: the code word currently processed in the decoding algorithm. Denoted by the string W. ":=" means "assignment". Using these definitions, we can, now, list the encoding algorithm: G
8 1. Initially, the dictionary and P are empty; 2. S:= next symbol in the symbolstream; 3. Is the string P+S present in the dictionary? if it is, P := P+S (extend P with S); if not, i. output these two objects to the codestream: the code word corresponding to P (if P is empty, output a zero); S, in the same form as input from the symbolstream; ii. add the string P+S to the dictionary; iii. P := empty; 4. are there more symbols in the symbolstream? if yes, return to step 2; if not: i. if P is not empty, output the code word corresponding to P; ii. END. The algorithm steps may look cluttered and dificult to comprehend. Let us also try to explain the algorithm step by step: At the beginning of encoding the dictionary is empty. In order to explain the principle of encoding, let's consider a point within the encoding process, when the dictionary already contains some strings. We start analyzing a new prefix in the symbolstream, beginning with an empty prefix. If its corresponding string (prefix + the symbol after it : P+C) is present in the dictionary, the prefix is extended with the character C. This extending is repeated until we get a string which is not present in the dictionary. This is a very clever way of searching for the maximum window length that repeats itself. At the point where the extended prefix does not exist in the library, we emit two outputs to the codestream: the code word that represents the prefix P, and then the symbol S. Then we add the final whole string (P+S) to the dictionary and start processing the next prefix in the symbolstream. Implementation note: A special case occurs if the dictionary doesn't contain a single symbol, even the starting one (for example, this always happens in the first encoding step). In that case we output a special code word that represents an empty string, followed by this character and add this character to the dictionary. The output from this algorithm is a sequence of codeword-symbol pairs (W,S). Each time a pair is emitted to the codestream, the string from the dictionary corresponding to W is extended with the symbol S and the resulting string is added to the dictionary. Notice that when a new string is added to the dictionary, the dictionary already contains all the substrings formed by removing characters from the end of the new string. For example, if ABBACB is added, then the dictionary should already contain the codewords: ABBAC, ABBA, ABB, AB, and A. Exercise: Encode the following input using LZ78: Position (P)
9 Symbol (S) A B B C B C A B A The encoding is done step by step as given in the following table: Step Position Dictionary addition Output 1. 1 A (0,A) 2. 2 B (0,B) 3. 3 BC (2,C) 4. 5 BCA (3,A) 5. 8 BA (2,A) Let us describe the example operation: The column Step indicates the number of the encoding step. Each encoding step is completed when the step 3.b. in the encoding algorithm is executed. The column Position indicates the current position in the input data. The column Dictionary added shows what string has been added to the dictionary. The index of the string is equal to the step number. The column Output presents the output in the form (W,C). The output of each step decodes to the string that has been added to the dictionary. The decoding is, again, quite simple. Decode (0,A), emit A, the dictionary is {A} Decode (0,A), emit B, the dictionary is {A, B} Decode (2,C), the second in the dictionary is B, so emit BC. the dictionary is {A, B, BC} Decode (3,A), the third in the dictionary is BC, so emit BCA. the dictionary is {A, B, BC, BCA} Decode (2,A), the second in the dictionary is B, so emit BA. the dictionary is {A, B, BC, BCA, BA} The overall emitted symbols are ABBCBCABA. Important note: The emitted symbols during encoding are not necessarily represented like (0,A), or (2,C), etc. The use of parantheses greatly reduces efficiency. Usually, the parantheses are omitted. However, the prefix and symbol (for example, in (2,C), 2 is the prefix location and C is the symbol) must be separated during decoding. Otherwise how could you resolve 0A0B2C3A2A? The numbers like 2 and 3 could as well be the symbols, themselves. In practice, the compressed data consists of the following syntax: AB*2C*3A*2A This eliminates the redundancy of parantheses and the redundancy of indicating an index for the symbols that do not exist in the dictionary (previously, indicated by something like (0,A), etc.) The symbol * followed by a number indicates the
10 dictionary location. Of course, we assume that the symbol * does not exist in our source. If, exceptionally, the * symbol occurs, then i is easily overcome by putting an escape sequence before *, i.e., we say **. This way, ** is decodable, because, since * is not followed by a number, the decoder can decide that we mean to represent the symbol of *. Click and watch the following flash animation (by Ahmet Gurbuz) illustrating LZ78 coding. The matlab script lz78.m interactively encodes an entered string using LZ78 algorithm (by Serhan Yavuz, requires ispresent.m). Concluding remarks: 1. There are variations and improements over the classical LZ77 and LZ78 algorithms. LZSS and LZW are, perhaps, the most popular ones. Indeed, LZW is really one of the dominant compression algorithms, and it is used in really many commercially available programs such as WinZip, PKzip, etc. 2. Although we have considered the lossless compression schemes (huffman, arithmetic, RLE, LZ, etc.) individually, you should not forget that they can also be used inside a lossy compression algorithms. Remember that we have covered scalar and vector quantization in the previous chapter. They were the steps that incorporated the loss into the coder. The result of quantization was a list of symbols from codebook entries. You should always keep in mind that the codebook can be considered as your alphabet, and the symbol list generated by the quantizer can be considered as a symbol stream which can be compressed using the described lossless coders. The students are urged to remember the overall block diagram of a typical signal coding algorithm. Check out a very nice JAVA applet which compresses the entered string using LZW algorithm online. You can go to the original page of the above applet here. Available links: You can find, literally, zillions of information on the LZ compression algorithms. Here are a few: 1. Lempel-Ziv compression algorithms 2. Interactive LZW compression 3. Lempel-Ziv-Welch Compression (LZW) 4. Lempel-Ziv compression of a file 5. Lempel-Ziv file compression
EE-575 INFORMATION THEORY - SEM 092
EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual
More informationEntropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code
Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic
More informationITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding
ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding
More informationCS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77
CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the
More informationChapter 7 Lossless Compression Algorithms
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationLempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static
More informationSimple variant of coding with a variable number of symbols and fixlength codewords.
Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length
More informationSIGNAL COMPRESSION Lecture Lempel-Ziv Coding
SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel
More informationData Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey
Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source
More informationEngineering Mathematics II Lecture 16 Compression
010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &
More informationOverview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes
Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1 Data Compression Decreases
More informationText Compression. General remarks and Huffman coding Adobe pages Arithmetic coding Adobe pages 15 25
Text Compression General remarks and Huffman coding Adobe pages 2 14 Arithmetic coding Adobe pages 15 25 Dictionary coding and the LZW family Adobe pages 26 46 Performance considerations Adobe pages 47
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Source Coding (Part 1) Hello everyone, we will start a new module today
More informationData Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression
An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression
More informationThe Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods
The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods R. Nigel Horspool Dept. of Computer Science, University of Victoria P. O. Box 3055, Victoria, B.C., Canada V8W 3P6 E-mail address: nigelh@csr.uvic.ca
More informationIntro. To Multimedia Engineering Lossless Compression
Intro. To Multimedia Engineering Lossless Compression Kyoungro Yoon yoonk@konkuk.ac.kr 1/43 Contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Dictionary-based
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?
More informationLossless Compression Algorithms
Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationA Comprehensive Review of Data Compression Techniques
Volume-6, Issue-2, March-April 2016 International Journal of Engineering and Management Research Page Number: 684-688 A Comprehensive Review of Data Compression Techniques Palwinder Singh 1, Amarbir Singh
More informationAbdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013
Abdullah-Al Mamun CSE 5095 Yufeng Wu Spring 2013 Introduction Data compression is the art of reducing the number of bits needed to store or transmit data Compression is closely related to decompression
More informationData Compression 신찬수
Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression
More informationWelcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson
Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory
More informationA study in compression algorithms
Master Thesis Computer Science Thesis no: MCS-004:7 January 005 A study in compression algorithms Mattias Håkansson Sjöstrand Department of Interaction and System Design School of Engineering Blekinge
More informationStudy of LZ77 and LZ78 Data Compression Techniques
Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information
More informationFPGA based Data Compression using Dictionary based LZW Algorithm
FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information
More informationMultimedia Systems. Part 20. Mahdi Vasighi
Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding
More informationVIDEO SIGNALS. Lossless coding
VIDEO SIGNALS Lossless coding LOSSLESS CODING The goal of lossless image compression is to represent an image signal with the smallest possible number of bits without loss of any information, thereby speeding
More informationComparative Study of Dictionary based Compression Algorithms on Text Data
88 Comparative Study of Dictionary based Compression Algorithms on Text Data Amit Jain Kamaljit I. Lakhtaria Sir Padampat Singhania University, Udaipur (Raj.) 323601 India Abstract: With increasing amount
More informationLZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014
LZW Compression Ramana Kumar Kundella Indiana State University rkundella@sycamores.indstate.edu December 13, 2014 Abstract LZW is one of the well-known lossless compression methods. Since it has several
More informationDigital Image Processing
Lecture 9+10 Image Compression Lecturer: Ha Dai Duong Faculty of Information Technology 1. Introduction Image compression To Solve the problem of reduncing the amount of data required to represent a digital
More informationDEFLATE COMPRESSION ALGORITHM
DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract
More informationData Compression. Guest lecture, SGDS Fall 2011
Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns
More informationOptimized Compression and Decompression Software
2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Optimized Compression and Decompression Software Mohd Shafaat Hussain, Manoj Yadav
More informationCompressing Data. Konstantin Tretyakov
Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical
More informationTHE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS
THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman
More information15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION
15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,
More informationImage compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year
Image compression Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Methods for Image Processing academic year 2017 2018 Data and information The representation of images in a raw
More information7: Image Compression
7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics
More informationIMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression
IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image
More informationCHAPTER II LITERATURE REVIEW
CHAPTER II LITERATURE REVIEW 2.1 BACKGROUND OF THE STUDY The purpose of this chapter is to study and analyze famous lossless data compression algorithm, called LZW. The main objective of the study is to
More informationChapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha
Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we have to take into account the complexity of the code.
More informationAnalysis of Parallelization Effects on Textual Data Compression
Analysis of Parallelization Effects on Textual Data GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000
More informationDistributed source coding
Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >
More informationFundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding
Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression
More informationCompression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:
CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,
More informationEncoding. A thesis submitted to the Graduate School of University of Cincinnati in
Lossless Data Compression for Security Purposes Using Huffman Encoding A thesis submitted to the Graduate School of University of Cincinnati in a partial fulfillment of requirements for the degree of Master
More informationI. Introduction II. Mathematical Context
Data Compression Lucas Garron: August 4, 2005 I. Introduction In the modern era known as the Information Age, forms of electronic information are steadily becoming more important. Unfortunately, maintenance
More informationCIS 121 Data Structures and Algorithms with Java Spring 2018
CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and
More informationA New Compression Method Strictly for English Textual Data
A New Compression Method Strictly for English Textual Data Sabina Priyadarshini Department of Computer Science and Engineering Birla Institute of Technology Abstract - Data compression is a requirement
More informationLecture 6 Review of Lossless Coding (II)
Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 6 Review of Lossless Coding (II) May 28, 2009 Outline Review Manual exercises on arithmetic coding and LZW dictionary coding 1 Review Lossy coding
More informationText Compression through Huffman Coding. Terminology
Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character
More informationCategory: Informational May DEFLATE Compressed Data Format Specification version 1.3
Network Working Group P. Deutsch Request for Comments: 1951 Aladdin Enterprises Category: Informational May 1996 DEFLATE Compressed Data Format Specification version 1.3 Status of This Memo This memo provides
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits
More informationMODELING DELTA ENCODING OF COMPRESSED FILES. and. and
International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University
More informationEE67I Multimedia Communication Systems Lecture 4
EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.
More informationError Resilient LZ 77 Data Compression
Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single
More informationChapter 1. Digital Data Representation and Communication. Part 2
Chapter 1. Digital Data Representation and Communication Part 2 Compression Digital media files are usually very large, and they need to be made smaller compressed Without compression Won t have storage
More informationHARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM
HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM Parekar P. M. 1, Thakare S. S. 2 1,2 Department of Electronics and Telecommunication Engineering, Amravati University Government College
More informationAn On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland
An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,
More informationModeling Delta Encoding of Compressed Files
Shmuel T. Klein 1, Tamar C. Serebro 1, and Dana Shapira 2 1 Department of Computer Science Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il, t lender@hotmail.com 2 Department of Computer Science
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?
More informationS 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources
Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014
More informationWIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION
WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college
More informationBasic Compression Library
Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library
More informationA Comparative Study Of Text Compression Algorithms
International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College
More informationModeling Delta Encoding of Compressed Files
Modeling Delta Encoding of Compressed Files EXTENDED ABSTRACT S.T. Klein, T.C. Serebro, and D. Shapira 1 Dept of CS Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il 2 Dept of CS Bar Ilan University
More informationDepartment of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY LOSSLESS METHOD OF IMAGE COMPRESSION USING HUFFMAN CODING TECHNIQUES Trupti S Bobade *, Anushri S. sastikar 1 Department of electronics
More informationYou can say that again! Text compression
Activity 3 You can say that again! Text compression Age group Early elementary and up. Abilities assumed Copying written text. Time 10 minutes or more. Size of group From individuals to the whole class.
More informationA Method for Virtual Extension of LZW Compression Dictionary
A Method for Virtual Extension of Compression Dictionary István Finta, Lóránt Farkas, Sándor Szénási and Szabolcs Sergyán Technology and Innovation, Nokia Networks, Köztelek utca 6, Budapest, Hungary Email:
More informationSource Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201
Source Coding Basics and Speech Coding Yao Wang Polytechnic University, Brooklyn, NY1121 http://eeweb.poly.edu/~yao Outline Why do we need to compress speech signals Basic components in a source coding
More informationImage coding and compression
Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding
More informationOPTIMIZATION OF LZW (LEMPEL-ZIV-WELCH) ALGORITHM TO REDUCE TIME COMPLEXITY FOR DICTIONARY CREATION IN ENCODING AND DECODING
Asian Journal Of Computer Science And Information Technology 2: 5 (2012) 114 118. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal
More informationIn this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc
JavaCC: LOOKAHEAD MiniTutorial 1. WHAT IS LOOKAHEAD The job of a parser is to read an input stream and determine whether or not the input stream conforms to the grammar. This determination in its most
More informationVC 12/13 T16 Video Compression
VC 12/13 T16 Video Compression Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline The need for compression Types of redundancy
More informationAn Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,
More informationIMAGE COMPRESSION TECHNIQUES
IMAGE COMPRESSION TECHNIQUES A.VASANTHAKUMARI, M.Sc., M.Phil., ASSISTANT PROFESSOR OF COMPUTER SCIENCE, JOSEPH ARTS AND SCIENCE COLLEGE, TIRUNAVALUR, VILLUPURAM (DT), TAMIL NADU, INDIA ABSTRACT A picture
More informationOptimization of Bit Rate in Medical Image Compression
Optimization of Bit Rate in Medical Image Compression Dr.J.Subash Chandra Bose 1, Mrs.Yamini.J 2, P.Pushparaj 3, P.Naveenkumar 4, Arunkumar.M 5, J.Vinothkumar 6 Professor and Head, Department of CSE, Professional
More informationIMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1
IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,
More informationTextual Data Compression Speedup by Parallelization
Textual Data Compression Speedup by Parallelization GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000
More informationOptimal Parsing. In Dictionary-Symbolwise. Compression Algorithms
Università degli Studi di Palermo Facoltà Di Scienze Matematiche Fisiche E Naturali Tesi Di Laurea In Scienze Dell Informazione Optimal Parsing In Dictionary-Symbolwise Compression Algorithms Il candidato
More informationHorn Formulae. CS124 Course Notes 8 Spring 2018
CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it
More informationCS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding
CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,
More informationSo, what is data compression, and why do we need it?
In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The
More informationDigital Image Processing
Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived
More informationData Compression Fundamentals
1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch 2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression»
More informationImplementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor
2016 International Conference on Information Technology Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor Vasanthi D R and Anusha R M.Tech (VLSI Design
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Enhanced LZW (Lempel-Ziv-Welch) Algorithm by Binary Search with
More information6. Finding Efficient Compressions; Huffman and Hu-Tucker
6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?
More informationFormal Languages and Compilers Lecture VI: Lexical Analysis
Formal Languages and Compilers Lecture VI: Lexical Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/ artale/ Formal
More informationMore Bits and Bytes Huffman Coding
More Bits and Bytes Huffman Coding Encoding Text: How is it done? ASCII, UTF, Huffman algorithm ASCII C A T Lawrence Snyder, CSE UTF-8: All the alphabets in the world Uniform Transformation Format: a variable-width
More information15 July, Huffman Trees. Heaps
1 Huffman Trees The Huffman Code: Huffman algorithm uses a binary tree to compress data. It is called the Huffman code, after David Huffman who discovered d it in 1952. Data compression is important in
More information