Data Compression. Guest lecture, SGDS Fall 2011
|
|
- Rebecca Price
- 6 years ago
- Views:
Transcription
1 Data Compression Guest lecture, SGDS Fall
2 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns Randomness Huffman Arithmetic coding Using phrases Dynamic context Ziv-Lempel Burrows-Wheeler Suffix sorting 2 Data compression is not traditional alg. course topic. But interesting, both in itself and as application of alg./d.s. Book: fragments, not that well chosen from compression experts view. This lecture: fuller view, with connections to what you learned on the course.
3 Basic model bitstream B Compress compressed version C(B) Expand original bitstream B Basic model for data compression Original message, consisting of characters, pixels, sound samples or whatever. In much of the lecture we assume that it consists of characters. But more generally, we can view it as just a stream of bits, because all data representations can be broken down to bits. Compression method: two algorithms: compress and expand. Seems impossible that you could get the original back, you would have to throw away some data. And sometimes you do. 3
4 Lossy Compress Compressed message Expand Images, video, sound, 4 If we accept loss, which we can do for some kinds of data, itʼs more believable that we can compress.
5 Lossless Compress Compressed message Expand Anything, including text, machine code, This lecture (and book): lossless only But there are also lossless methods, which reproduce the original exactly. Lossless techniques are useful also in lossy methods. Even when accepting loss, you want to represent exact information as compactly as possible. One case where it is fairly easy to accept is if there are unused bits in B, i.e., it does not store the data as compactly as it could. 5
6 Easy: alphabet compaction Genome String over alphabet { A, C, T, G } Encode N-character genome: ATAGATGCATAG Ascii bytes char encoding A C T G bit encoding char encoding A 00 C 01 T 10 G Thatʼs nice, but in general, there are not unused bits.
7 But, in general Any representable data may appear No superfluous bits to remove 7
8 Computational formulation Compress Input: N-bit message B Output: Smallest possible program, C(B), that produces B as output (when given no input) Expand Run C(B), get B. Length of C(B) is Kolmogorov complexity of B UNDECIDABLE The most general kind of code is a programming language. Letʼs say that C(B) is a program that produces B. Letʼs find the smallest such program. Undecidable: There is no, can be no, algorithm that computes it in general. Generally, one should not be too discouraged. Sometimes a non-general algorithm is useful. But letʼs make this easier, by requiring not that C(B) is the smallest possible, but just that it is smaller than B. 8
9 New attempt: skip smallest possible Compress Input: N-bit message B Output: N -bit message C(B), N < N Expand Input: N -bit message C(B) Output: N-bit message B IMPOSSIBLE 9 Why is this impossible? The pigeon hole principle applies.
10 B: 2 N possibilities C(B): 2 N possibilities Compress Compression means mapping each dot on the left to some dot on the right. Since there are fewer possibilities for C(B) than B, there are some B1 and B2 for which C(B1) = C(B2). This is easy to see when N is 2 or 3 or so, but donʼt get fooled: it applies even if N is billions. 10
11 B: 2 N possibilities C(B): 2 N possibilities Expand? 11 Expand cannot choose between B1 and B2.
12 So, we give up? 12 Some of the 2 N messages may be illegal. No need to encode them. Even if they are all legal, some are more probable than others.
13 Modified goal Compress Input: N-bit message B Output: N -bit message C(B) N < N for most common instances of B For less common B: ok if N > N Expand Input: N -bit message C(B) Output: N-bit message B A little vague, not really a mathematical definition. Need some more information theory to make a formal definition, which is beyond the scope of this lecture. 13
14 Example Mary had a little lamb. hsy, iimlh kwvsadjh h.j Text (upper/lower case letters + punctuation), 6 bits/char 23 6 bits = 138 bits Compressed, text can use fewer, say 2.5 bits/char, because text patterns are predictable bits = 57.5 bits 23 6 bits = 138 bits Compressed, this data (with no predictable patterns) will use more, say 6.8 bits/char bits = bits 14 LEFT Without compression, just alphabet compaction, we can get 138. Compressed, we might get, e.g., RIGHT Uncompressed, the same length Compressed we can allow this a little more. So, how then? We have to find predictable patterns.
15 first decimals of π No normal compression method finds this pattern Compression models all based on repetition and/or skewed distribution 15 If we donʼt have special knowledge (of π in this case), the message looks random.
16 Randomness Message that looks random will not be compressed Sequence that is truly random cannot be compressed (pigeon-holes again) Maximum-compressed data looks random 16 Looking random depends on the model used. Every compression method has one, explicit or implicit. Now letʼs look at a message where we can easily see some pattern.
17 Run-length encoding (RLE) How to compress ? How to compress ? If you would just describe this bit sequence, how would you do it? Letʼs use that as a compression format. To make it into a bit string, we need to encode numbers in binary too (next slide) becomes becomes This compressed 40 bits into 16 bits. What compression do we generally get with this method? 17
18 Decimal Binary
19 RLE compression efficiency What sequence gives best compression? / bits/bit Worst compression? /1 4 bits/bit More (than 4) bits for lengths: better best case, worse worse case Used as component in some systems, but not a good general compression scheme. Letʼs look at a text example, with a more intricate pattern. 19
20 ABRACADABRA! First attempt: alphabet compaction char encoding A 000 B 001 C 010 D 011 R 100! bits = 36 bits 20 Encoding But do we have to have the same number of bits for all characters?
21 ABRACADABRA! char encoding A 0 B 1 C 01 D 10 R 00! 11 Won t work! (why not?) Can variable-length code work? Yes! If it is prefix-free Encoding
22 ABRACADABRA! Try variable lengths, with short codewords for common characters. char encoding A 0 B C 110 D 100 R 1110! bits total, less than 36! 22 So, we seem to have found a trick. Letʼs look at a more intuitive way to represent this code.
23 Tree representation Codeword table key! A B C D R value Compressed bitstring Trie representation 0 1 A D! C 0 1 R B Compress Start at leaf; follow path up to the root; print bits in reverse. Expand Start at root. Go left if bit is 0; go right if 1. If leaf node, print char and return to root. But: How do we find the best code? 23 Code in the book. Now, how do we make best use of this trick?
24 Huffman code Count frequencies of characters Make a set with one node for each letter Extract two nodes with smallest frequency Combine them, with new node as root Add new root node to set Repeat, until only one node (Optimality proof: see book.) 24
25 Huffman code 12 char freq A 5 B 2 5 A C 1 D 1 1 D R 0 1 B 2 R 2! ! C 1 Huffman code construction for A B R A C A D A B R A! 25 Little red numbers are frequencies.
26 Huffman code Compress N characters, alphabet size R Data structure(s)? Time complexity? 26 Count frequencies: N Build binary min-heap, based on freq.: R (R-1) steps, extract two insert one: R lg R Alt. use two FIFO queues Q1, Q2 Sort on freq., insert into Q1 in freq. order: sort-time(r, values 0 N) Min-freq. node is always next to get from either Q1 or Q2 Insert new nodes in Q2 sort-time(r, values 0 N) = R lg R? No, key-indexed sorting can normally get it down to R. But how does expand know what the encoding is?
27 Compressed message must include code Codeword of each character (book) or, frequency of each character. Expand builds tree in same way as compress or, the length of the codeword for each character. Enough info to rebuild tree Note: Huffman can automatically compact alphabet 27 No problem if alphabet is relatively small. If we donʼt include characters with zero frequency in the code, we get natural compaction. Many descriptions stop here. We found the optimal way to compress. But far from it.
28 The curse of whole-bit codewords Huffman-encoding characters is not always the best we can do Example: 1000-char message with highly skewed distribution char freq encoding A 990/ B 7/ Total: = 1010 bits RLE would do better! C 3/ How can we do better? One way is to use another alphabet.
29 Use double characters char freq (computed) encoding AA (990/1000) AB (990/1000) (7/1000) AC (990/1000) (3/1000) BA (7/1000) (990/1000) BB (7/1000) BC (7/1000) (3/1000) CA (3/1000) (990/1000) CB (3/1000) (7/1000) CC (3/1000) Total: ca 600 bits 29
30 Keep expanding alphabet Combining three characters, to alphabet size 27, improves precision further. Etc. Finally: combining N characters. Message is one single character Arithmetic coding Arithmetic encoder takes one freq. interval at a time, outputs bits as they can be determined. We do not go into details for how to do arithmetic coding in practice. Just please accept that the problem has a solution. 30
31 Entropy coding Huffman, Shannon-Fano, canonical code, arithmetic coding techniques exist to output right number of bits, with sufficient precision For details, see e.g. Witten, Moffat, & Bell, Managing Gigabytes 31
32 But, wait a minute char freq (computed) AA (990/1000) 2.98 AB (990/1000) (7/1000).0069 AC (990/1000) (3/1000).0030 BA (7/1000) (990/1000).0069 BB (7/1000) BC (7/1000) (3/1000) CA (3/1000) (990/1000).0030 CB (3/1000) (7/1000) CC (3/1000) These are clearly not the best frequency estimates For instance, in English, Th is more common than ht We can get more data from the original message 32
33 Idea I: Statistics with context Example: in English, the letter u is not among the most common few except after q, where it is by far the most common! Idea: use different frequency tables based on the previous character 33
34 ABRACADABRA! After A After B After C char freq char freq char freq A 0 A 0 A 1 B 2 C 1 D 1 B 0 C 0 D 2 B 0 C 0 D 0 R 0 R 0 R 0! 1! 0! 0 Build, e.g., different Huffman codes for each context 34
35 More detailed contexts Example, after compres, s is overrepresented Use longer strings as context: those significant in message Problem: lots of codes! Need to be included in compressed message? Solution: dynamic contexts 35
36 context et has appeared 2 times Context tree for string letlettertele 36
37 Dynamic context modeling Start with just one, or R, contexts. Entries in frequency tables equal Add contexts and update statistics by one character at a time Build exactly the same way in expand as in compress. No code needs to be included in compressed method! Prediction by partial matching (PPM), Dynamic Markov Chaining (DMC) Good compression properties, but take much computation in both compress and expand 37
38 Idea 2: Build dictionaries Instead of individual characters, encode phrases Computationally simpler than statistical modeling Less sensitive to lack of precision in bit codes (alphabet is large) Dictionary methods are equivalent to (weird) special cases of statistical models 38
39 LZ77 Compressed message consists of triples <pos, length, next> position (counting backwards) of phrase first character after phrase number of characters in phrase 39
40 <0,0,a> <0,0,b> <2,1,a> <3,2,b> <5,3,b> <6,6,b> Expand: abaababaabbabaabbb Considered impractical for years, because scanning for longest string during compression takes N2 time but does it? Design compression algorithm! Data structures? Time complexity? 40
41 Idea 3: Block sorting Group characters in the output according to their contexts More similar contexts, closer together Generates repetitions more easy to compress 41
42 Idea 3: Block sorting In chunk of message, sort all strings (contexts) Encode characters in their sorted-context order, lots of repetition Then compress with RLE and/or move to front Remarkably, it s easy to get original order back! Burrows Wheeler transform (BWT) 42 Contexts are strings, so we can use string sorting for grouping/ordering.
43 Note on backward contexts String after a character works as contest (just as well as string before) After compres, s is overrepresented before ompress, c is overrepresented 43
44 abraca Sort rotations Encode row of original message Encode last characters in rows row 0 aabrac 1 abracaa 2 acaabr 3 bracaaa 4 caabraa 5 racaab Transformed message: <1, caraab > 44
45 Expand row 0 c 1 a 2 r 3 a 4 a 5 b 45
46 Expand row 0 a c 1 a a 2 a r 3 b a 4 c a 5 r b 46
47 Expand row 0 a c 1 a a 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br sorted on second character 47
48 Expand row 0 a c 1 a a 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T sorted on second character 48
49 Expand row 0 a c 1 a a 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T a 49
50 Expand row 0 a c 1 a ca 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T ca 50
51 Expand row 0 a c 1 a aca 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T aca 51
52 Expand row 0 a c 1 a raca 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T raca 52
53 Expand row 0 a c 1 abraca 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T braca 53 Expand is quick, linear time. Compress is heavier, because of rotation sorting.
54 Expand row 0 a c 1 abraca 2 a r 3 b a 4 c a 5 r b rotated ca aa ra ab ac br T abraca 54 Expand is quick, linear time. Compress is heavier, because of rotation sorting.
55 Rotation sorting suffix sorting Add implicit last character $, smallest in alphabet Sorting rotations of abraca$ = sorting suffixes of abraca$ 55
56 Suffix sorting 0 b a b a a a a b c b a b a a a a a $ 1 a b a a a a b c b a b a a a a a $ 2 b a a a a b c b a b a a a a a $ 3 a a a a b c b a b a a a a a $ 4 a a a b c b a b a a a a a $ 5 a a b c b a b a a a a a $ 6 a b c b a b a a a a a $ 7 b c b a b a a a a a $ 8 c b a b a a a a a $ 9 b a b a a a a a $ 10 a b a a a a a $ 11 b a a a a a $ 12 a a a a a $ 13 a a a a $ 14 a a a $ 15 a a $ 16 a $ 17 $ b a b a a a a b c b a b a a a a a $ $ 16 a $ 15 a a $ 14 a a a $ 13 a a a a $ 12 a a a a a $ 3 a a a a b c b a b a a a a a $ 4 a a a b c b a b a a a a a $ 5 a a b c b a b a a a a a $ 10 a b a a a a a $ 1 a b a a a a b c b a b a a a a a $ 6 a b c b a b a a a a a $ 11 b a a a a a $ 2 b a a a a b c b a b a a a a a $ 9 b a b a a a a a $ 0 b a b a a a a b c b a b a a a a a $ 7 b c b a b a a a a a $ 8 c b a b a a a a a $ BWT output a a a a a b b a a b b a a a c $ a b Space is linear, but sorting sees quadratic data. Comparisons take linear time. So, comparison-based algorithm has worst case order of growth N 2 lg n. 56
57 Suffix sorting time complexity Naive: at least N2 in the worst case Prefix doubling: N lg N Suffix tree, recursive: N Suffix sorting is the computationally heaviest part of BWT. Specialized methods exist that improve on the worst case. 57
CS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits
More informationCS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77
CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the
More informationSIGNAL COMPRESSION Lecture Lempel-Ziv Coding
SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel
More informationDavid Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.
David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression
More informationLossless Compression Algorithms
Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information
More informationData Compression Fundamentals
1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch 2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression»
More informationEntropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code
Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic
More informationData Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression
An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression
More informationFundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding
Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding
More informationEE67I Multimedia Communication Systems Lecture 4
EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.
More informationText Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2
Text Compression Jayadev Misra The University of Texas at Austin July 1, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction To Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable
More informationA study in compression algorithms
Master Thesis Computer Science Thesis no: MCS-004:7 January 005 A study in compression algorithms Mattias Håkansson Sjöstrand Department of Interaction and System Design School of Engineering Blekinge
More informationData Compression 신찬수
Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?
More information15 July, Huffman Trees. Heaps
1 Huffman Trees The Huffman Code: Huffman algorithm uses a binary tree to compress data. It is called the Huffman code, after David Huffman who discovered d it in 1952. Data compression is important in
More informationITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding
ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications
More informationCOMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class:
COMPSCI 650 Applied Information Theory Feb, 016 Lecture 5 Instructor: Arya Mazumdar Scribe: Larkin Flodin, John Lalor 1 Huffman Coding 1.1 Last Class s Example Recall the example of Huffman Coding on a
More informationDigital Image Processing
Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived
More informationCMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017
CMPSC112 Lecture 37: Data Compression Prof. John Wenskovitch 04/28/2017 What You Don t Get to Learn Self-balancing search trees: https://goo.gl/houquf https://goo.gl/r4osz2 Shell sort: https://goo.gl/awy3pk
More informationCompression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:
CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,
More informationWelcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson
Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information
More informationCSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)
CSE100 Advanced Data Structures Lecture 12 (Based on Paul Kube course materials) CSE 100 Coding and decoding with a Huffman coding tree Huffman coding tree implementation issues Priority queues and priority
More informationData compression with Huffman and LZW
Data compression with Huffman and LZW André R. Brodtkorb, Andre.Brodtkorb@sintef.no Outline Data storage and compression Huffman: how it works and where it's used LZW: how it works and where it's used
More informationAn Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationCompressing Data. Konstantin Tretyakov
Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical
More informationA Research Paper on Lossless Data Compression Techniques
IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 1 June 2017 ISSN (online): 2349-6010 A Research Paper on Lossless Data Compression Techniques Prof. Dipti Mathpal
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 11 Coding Strategies and Introduction to Huffman Coding The Fundamental
More informationEE-575 INFORMATION THEORY - SEM 092
EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------
More informationENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2
ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 1 Outline Canonical Huffman code Huffman
More informationSuffix-based text indices, construction algorithms, and applications.
Suffix-based text indices, construction algorithms, and applications. F. Franek Computing and Software McMaster University Hamilton, Ontario 2nd CanaDAM Conference Centre de recherches mathématiques in
More informationMultimedia Networking ECE 599
Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics
More informationEncoding. A thesis submitted to the Graduate School of University of Cincinnati in
Lossless Data Compression for Security Purposes Using Huffman Encoding A thesis submitted to the Graduate School of University of Cincinnati in a partial fulfillment of requirements for the degree of Master
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Source Coding (Part 1) Hello everyone, we will start a new module today
More information6. Finding Efficient Compressions; Huffman and Hu-Tucker
6. Finding Efficient Compressions; Huffman and Hu-Tucker We now address the question: how do we find a code that uses the frequency information about k length patterns efficiently to shorten our message?
More informationDictionary techniques
Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.
More informationChapter 7 Lossless Compression Algorithms
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7
More informationCIS 121 Data Structures and Algorithms with Java Spring 2018
CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and
More informationAn Asymmetric, Semi-adaptive Text Compression Algorithm
An Asymmetric, Semi-adaptive Text Compression Algorithm Harry Plantinga Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 planting@cs.pitt.edu Abstract A new heuristic for text
More informationAnalysis of Parallelization Effects on Textual Data Compression
Analysis of Parallelization Effects on Textual Data GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual
More informationEngineering Mathematics II Lecture 16 Compression
010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &
More informationData Representation. Types of data: Numbers Text Audio Images & Graphics Video
Data Representation Data Representation Types of data: Numbers Text Audio Images & Graphics Video Analog vs Digital data How is data represented? What is a signal? Transmission of data Analog vs Digital
More informationChapter 4: Application Protocols 4.1: Layer : Internet Phonebook : DNS 4.3: The WWW and s
Chapter 4: Application Protocols 4.1: Layer 5-7 4.2: Internet Phonebook : DNS 4.3: The WWW and E-Mails OSI Reference Model Application Layer Presentation Layer Session Layer Application Protocols Chapter
More informationString Matching. Pedro Ribeiro 2016/2017 DCC/FCUP. Pedro Ribeiro (DCC/FCUP) String Matching 2016/ / 42
String Matching Pedro Ribeiro DCC/FCUP 2016/2017 Pedro Ribeiro (DCC/FCUP) String Matching 2016/2017 1 / 42 On this lecture The String Matching Problem Naive Algorithm Deterministic Finite Automata Knuth-Morris-Pratt
More information15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION
15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:
More informationHuffman Code Application. Lecture7: Huffman Code. A simple application of Huffman coding of image compression which would be :
Lecture7: Huffman Code Lossless Image Compression Huffman Code Application A simple application of Huffman coding of image compression which would be : Generation of a Huffman code for the set of values
More informationOn Generalizations and Improvements to the Shannon-Fano Code
Acta Technica Jaurinensis Vol. 10, No.1, pp. 1-12, 2017 DOI: 10.14513/actatechjaur.v10.n1.405 Available online at acta.sze.hu On Generalizations and Improvements to the Shannon-Fano Code D. Várkonyi 1,
More informationECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University
ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationA Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression
A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression P. RATNA TEJASWI 1 P. DEEPTHI 2 V.PALLAVI 3 D. GOLDIE VAL DIVYA 4 Abstract: Data compression is the art of reducing
More informationMultimedia Systems. Part 20. Mahdi Vasighi
Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding
More informationTheory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur
Theory of Computation Prof. Raghunath Tewari Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 01 Introduction to Finite Automata Welcome everybody. This is
More informationLempel-Ziv compression: how and why?
Lempel-Ziv compression: how and why? Algorithms on Strings Paweł Gawrychowski July 9, 2013 s July 9, 2013 2/18 Outline Lempel-Ziv compression Computing the factorization Using the factorization July 9,
More informationCSC 310, Fall 2011 Solutions to Theory Assignment #1
CSC 310, Fall 2011 Solutions to Theory Assignment #1 Question 1 (15 marks): Consider a source with an alphabet of three symbols, a 1,a 2,a 3, with probabilities p 1,p 2,p 3. Suppose we use a code in which
More informationFPGA based Data Compression using Dictionary based LZW Algorithm
FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,
More informationGreedy Algorithms CHAPTER 16
CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often
More informationSimple variant of coding with a variable number of symbols and fixlength codewords.
Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length
More informationTHE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS
THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman
More informationDEFLATE COMPRESSION ALGORITHM
DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract
More informationAlgorithms Dr. Haim Levkowitz
91.503 Algorithms Dr. Haim Levkowitz Fall 2007 Lecture 4 Tuesday, 25 Sep 2007 Design Patterns for Optimization Problems Greedy Algorithms 1 Greedy Algorithms 2 What is Greedy Algorithm? Similar to dynamic
More informationInformation Theory and Communication
Information Theory and Communication Shannon-Fano-Elias Code and Arithmetic Codes Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/12 Roadmap Examples
More informationECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013
ECE 417 Guest Lecture Video Compression in MPEG-1/2/4 Min-Hsuan Tsai Apr 2, 213 What is MPEG and its standards MPEG stands for Moving Picture Expert Group Develop standards for video/audio compression
More informationA Comprehensive Review of Data Compression Techniques
Volume-6, Issue-2, March-April 2016 International Journal of Engineering and Management Research Page Number: 684-688 A Comprehensive Review of Data Compression Techniques Palwinder Singh 1, Amarbir Singh
More informationSo, what is data compression, and why do we need it?
In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The
More informationCOMPRESSION OF SMALL TEXT FILES
COMPRESSION OF SMALL TEXT FILES Jan Platoš, Václav Snášel Department of Computer Science VŠB Technical University of Ostrava, Czech Republic jan.platos.fei@vsb.cz, vaclav.snasel@vsb.cz Eyas El-Qawasmeh
More informationADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS
ADVANCED LOSSLESS TEXT COMPRESSION ALGORITHM BASED ON SPLAY TREE ADAPTIVE METHODS RADU RĂDESCU, ANDREEA HONCIUC *1 Key words: Data compression, Splay Tree, Prefix, ratio. This paper presents an original
More information14.4 Description of Huffman Coding
Mastering Algorithms with C By Kyle Loudon Slots : 1 Table of Contents Chapter 14. Data Compression Content 14.4 Description of Huffman Coding One of the oldest and most elegant forms of data compression
More informationRepetition 1st lecture
Repetition 1st lecture Human Senses in Relation to Technical Parameters Multimedia - what is it? Human senses (overview) Historical remarks Color models RGB Y, Cr, Cb Data rates Text, Graphic Picture,
More informationIMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I
IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I 1 Need For Compression 2D data sets are much larger than 1D. TV and movie data sets are effectively 3D (2-space, 1-time). Need Compression for
More informationChapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)
Chapter 5 VARIABLE-LENGTH CODING ---- Information Theory Results (II) 1 Some Fundamental Results Coding an Information Source Consider an information source, represented by a source alphabet S. S = { s,
More informationLCP Array Construction
LCP Array Construction The LCP array is easy to compute in linear time using the suffix array SA and its inverse SA 1. The idea is to compute the lcp values by comparing the suffixes, but skip a prefix
More informationAnalysis of Algorithms
Algorithm An algorithm is a procedure or formula for solving a problem, based on conducting a sequence of specified actions. A computer program can be viewed as an elaborate algorithm. In mathematics and
More informationQuad-Byte Transformation as a Pre-processing to Arithmetic Coding
Quad-Byte Transformation as a Pre-processing to Arithmetic Coding Jyotika Doshi GLS Inst.of Computer Technology Opp. Law Garden, Ellisbridge Ahmedabad-380006, INDIA Savita Gandhi Dept. of Computer Science;
More informationCS : Data Structures
CS 600.226: Data Structures Michael Schatz Nov 16, 2016 Lecture 32: Mike Week pt 2: BWT Assignment 9: Due Friday Nov 18 @ 10pm Remember: javac Xlint:all & checkstyle *.java & JUnit Solutions should be
More informationAn Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression
CS106B Winter 2017 Handout #21 March 3, 2017 Huffman Encoding and Data Compression Handout by Julie Zelenski with minor edits by Keith Schwarz In the early 1980s, personal computers had hard disks that
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationA Comparative Study Of Text Compression Algorithms
International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College
More informationAlphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets
Alphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets Dan Chen Yi-Jen Chiang Nasir Memon Xiaolin Wu Department of Computer and Information Science Polytechnic University
More informationAbdullah-Al Mamun. CSE 5095 Yufeng Wu Spring 2013
Abdullah-Al Mamun CSE 5095 Yufeng Wu Spring 2013 Introduction Data compression is the art of reducing the number of bits needed to store or transmit data Compression is closely related to decompression
More informationCourse notes for Data Compression - 2 Kolmogorov complexity Fall 2005
Course notes for Data Compression - 2 Kolmogorov complexity Fall 2005 Peter Bro Miltersen September 29, 2005 Version 2.0 1 Kolmogorov Complexity In this section, we present the concept of Kolmogorov Complexity
More informationUniversity of Waterloo CS240 Spring 2018 Help Session Problems
University of Waterloo CS240 Spring 2018 Help Session Problems Reminder: Final on Wednesday, August 1 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems
More informationDigital Image Processing
Lecture 9+10 Image Compression Lecturer: Ha Dai Duong Faculty of Information Technology 1. Introduction Image compression To Solve the problem of reduncing the amount of data required to represent a digital
More informationTextual Data Compression Speedup by Parallelization
Textual Data Compression Speedup by Parallelization GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000
More informationIntro. To Multimedia Engineering Lossless Compression
Intro. To Multimedia Engineering Lossless Compression Kyoungro Yoon yoonk@konkuk.ac.kr 1/43 Contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Dictionary-based
More informationCS 206 Introduction to Computer Science II
CS 206 Introduction to Computer Science II 04 / 25 / 2018 Instructor: Michael Eckmann Today s Topics Questions? Comments? Balanced Binary Search trees AVL trees / Compression Uses binary trees Balanced
More informationFigure-2.1. Information system with encoder/decoders.
2. Entropy Coding In the section on Information Theory, information system is modeled as the generationtransmission-user triplet, as depicted in fig-1.1, to emphasize the information aspect of the system.
More informationCS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding
CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,
More informationArithmetic Coding. Arithmetic Coding
Contents Image Compression Lecture 3 Arithmetic Code Introduction to & Decoding Algorithm Generating a Binary Code for Huffman codes have to be an integral number of bits long, while the entropy value
More informationText Compression through Huffman Coding. Terminology
Text Compression through Huffman Coding Huffman codes represent a very effective technique for compressing data; they usually produce savings between 20% 90% Preliminary example We are given a 100,000-character
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationChapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.
Chapter 9 Greedy Technique Copyright 2007 Pearson Addison-Wesley. All rights reserved. Greedy Technique Constructs a solution to an optimization problem piece by piece through a sequence of choices that
More informationCSED233: Data Structures (2017F) Lecture12: Strings and Dynamic Programming
(2017F) Lecture12: Strings and Dynamic Programming Daijin Kim CSE, POSTECH dkim@postech.ac.kr Strings A string is a sequence of characters Examples of strings: Python program HTML document DNA sequence
More informationGreedy Algorithms. Alexandra Stefan
Greedy Algorithms Alexandra Stefan 1 Greedy Method for Optimization Problems Greedy: take the action that is best now (out of the current options) it may cause you to miss the optimal solution You build
More information