Lossless compression II
|
|
- Timothy Hensley
- 5 years ago
- Views:
Transcription
1 Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0) CSCI 470: Web Science Keith Vertanen
2 Overview Flavors of StaBc vs. Dynamic vs. AdapBve Lempel- Ziv- Welch (LZW) Fixed- length Variable- length paqern StaBsBcal coding ArithmeBc coding PredicBon by ParBal Match (PPM) 2
3 StaBc Compression models Predefined map for all text, e.g. ASCII, Morse Code Not opbmal: different texts = different stabsbcs Dynamic Generate model based on text Requires inibal pass before can start Must transmit the model (e.g. Huffman coding) AdapBve More accurate modeling = beqer Decoding must start from beginning (e.g. LZW) 3
4 LZW Lempel- Ziv- Welch (LZW) Basis for many popular formats e.g. PNG, 7zip, gzip, jar, PDF, compress, pkzip, GIF, TIFF Algorithm basics: Maintain a table of fixed- length s for variable- length paqerns in the input Built progressively by compressor and expander Table does not need to be transmiqed! Table is of fixed size Entries for all single characters in alphabet Entries for longer subs encountered 4
5 Details: LZW example 7- bit ASCII characters, 8- bit For real use, 8- bit ~12- bit Alphabet: 128 characters longer s Codeword: 2- digit hex value = single characters, e.g. 41 = A, 52 = R 80 = end of file 81- FF = longer s, e.g. 81 = AB, 88 = ABR Employs lookahead character to add s 5
6 input A B R A C A D A B R A B R A B R A matches Compressing D 44 Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c R 52 Ini6al set of character s 6
7 input A B R A C A D A B R A B R A B R A matches A 41 D 44 R 52 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 7
8 input A B R A C A D A B R A B R A B R A matches A B D 44 R 52 BR 82 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 8
9 input A B R A C A D A B R A B R A B R A matches A B R D 44 R 52 BR 82 RA 83 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 9
10 input A B R A C A D A B R A B R A B R A matches A B R A D 44 R 52 BR 82 RA 83 AC 84 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 10
11 input A B R A C A D A B R A B R A B R A matches A B R A C A D D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 11
12 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s 12
13 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 13
14 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 BRA 8A 14
15 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR ABR D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 BRA 8A ABRA 8B 15
16 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR ABR A D 44 BR 82 RA 83 AC 84 CA 85 AD 86 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c R 52 Ini6al set of character s DA 87 ABR 88 RAB 89 BRA 8A ABRA 8B Input: 17 ASCII characters * 7 bits = 119 bits Output: 12 s * 8 bits = 96 bits Compression ratio: 82% 16
17 Compression data structure LZW uses two table operabons: Find longest- prefix match from current posibon Add entry with character added to longest match Trie data structure, "retrieval" Linked tree structure Path in tree defines Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c 17
18 D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B AD 86 BR 82 DA 87 RA 83 ABR 88 AC 84 RAB 89 CA 85 BRA 8A 18
19 Java implementabon, LZW private static final int R = 256; private static final int L = 4096; private static final int W = 12; // number of input chars // number of s = 2^W // width public static void compress() { String input = BinaryStdIn.readString(); // read input as TST<Integer> st = new TST<Integer>(); // trie data structure for (int i = 0; i < R; i++) st.put("" + (char) i, i); int code = R+1; // s for single chars // R is for EOF } while (input.length() > 0) { String s = st.longestprefixof(input); BinaryStdOut.write(st.get(s), W); // write W- bit for s int t = s.length(); if (t < input.length() && code < L) st.put(input.sub(0, t + 1), code++); // add new input = input.sub(t); } BinaryStdOut.write(R, W); // write last BinaryStdOut.close(); 19
20 input output Expansion Write the current val Read x from input D 44 Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s R 52 Ini6al set of character s val old = A x = 42 s = B c = B next code = 81 val new = B 20
21 input output A D 44 R 52 Ini6al set of character s Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = B x = 52 s = R c = R next code = 82 val new = R 21
22 input output A B D 44 R 52 Ini6al set of character s BR 82 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = R x = 41 s = A c = A next code = 83 val new = A 22
23 input output A B R D 44 R 52 Ini6al set of character s BR 82 RA 83 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 43 s = C c = C next code = 84 val new = C 23
24 input output A B R A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = C x = 41 s = A c = A next code = 85 val new = A 24
25 input output A B R A C D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 44 s = D c = D next code = 86 val new = D 25
26 input output A B R A C A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = D x = 81 s = AB c = A next code = 87 val new = AB 26
27 input output A B R A C A D D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = AB x = 83 s = RA c = R next code = 88 val new = RA 27
28 input output A B R A C A D AB D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = RA x = 82 s = BR c = B next code = 89 val new = BR 28
29 input output A B R A C A D AB RA D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = BR x = 88 s = ABR c = A next code = 8A = RA val new 29
30 input output A B R A C A D AB RA BR D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA 8A Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = ABR x = 41 s = A c = A next code = 8B = RA val new 30
31 input output A B R A C A D AB RA BR ABR D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA ABRA 8A 8B Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 80 s = c = next code = val new = 31
32 input output A B R A C A D AB RA BR ABR A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA ABRA 8A 8B Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = x = s = c = next code = val new = 32
33 Expansion data structure For a W- bit, look up value An array of size 2 W D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s ABR 88 RAB 89 BRA ABRA 8A 8B 33
34 A tricky case: ABABABA input A B A B A B A matches Ini6al set of s 34
35 A tricky case: ABABABA input A B A B A B A matches A 41 Ini6al set of s 35
36 A tricky case: ABABABA input A B A B A B A matches A B BA 82 Ini6al set of s 36
37 A tricky case: ABABABA input A B A B A B A matches A B AB BA 82 ABA 83 Ini6al set of s 37
38 A tricky case: ABABABA input A B A B A B A matches A B AB ABA BA 82 ABA 83 NoBce right aler adding entry for "AB" the subsequent text had "AB" as a prefix. Ini6al set of s 38
39 A tricky case: expansion ABABABA input output Ini6al set of s 39
40 A tricky case: expansion ABABABA input output A Ini6al set of s 40
41 A tricky case: expansion ABABABA input output A B BA 82 Ini6al set of s 41
42 A tricky case: expansion ABABABA input output A B AB BA 82 We need to know the first character of 83 in order to enter 83 into the table! Fix: First character of 83 must be same as first leqer of current 81, i.e. "A". Ini6al set of s 42
43 A tricky case: expansion ABABABA input output A B AB ABA BA 82 ABA 83 Ini6al set of s 43
44 Java implementabon, LZW expansion private static final int R = 256; private static final int L = 4096; private static final int W = 12; // number of input chars // number of s = 2^W // width public static void expand() { String[] st = new String[L]; int i; // next available value // initialize symbol table with all 1- character s for (i = 0; i < R; i++) st[i] = "" + (char) i; st[i++] = ""; // (unused) lookahead for EOF int = BinaryStdIn.readInt(W); String val = st[]; } while (true) { BinaryStdOut.write(val); = BinaryStdIn.readInt(W); if ( == R) break; String s = st[]; if (i == ) s = val + val.charat(0); if (i < L) st[i++] = val + s.charat(0); val = s; } BinaryStdOut.close(); // special case hack 44
45 LZW decisions How big of a symbol table? How long is message? Whole message similar model? Many variabons What to do when symbol table fills up? Throw away and start over (e.g. GIF) Throw away when not effecbve (e.g. Unix compress) Many variabons Put longer subs in symbol table? Many variabons 45
46 Lempel- Ziv variants LZW variants LZ77, published Lempel & Ziv, 1977, not patented PNG LZ78, published Lempel & Ziv in 1978, patented LZW, Welch extension to LZ78, patented (expired 2003) GIF, TIFF, Pkzip Deflate = LZ77 variant + Huffman coding 7zip, gzip, jar, PDF 46
47 Some experiments Compressing Moby Dick Using Java programs for LZW and Huffman Different dicbonary sizes for LZW or reseqng when full Compared to zip and gzip 1,191,463 mobydick.txt 485,561 mobydick.gz 485,790 mobydick.zip 398,480 mobydick.xz 371,394 mobydick.bz2 667,651 mobydick.huff 597,060 mobydick.lzw 814,578 mobydick.huff.lzw 592,830 mobydick.lzw.huff 682,400 mobydick.lzw.reset 541,261 mobydick.lzw14 521,700 mobydick.lzw15 503,050 mobydick.lzw16 501,193 mobydick.lzw16.huff 521,700 mobydick.lzw17 514,393 mobydick.lzw18 571,548 mobydick.lzw20 47
48 Enter stabsbcal coding Natural language quite predictable ~ 1 bit of entropy per symbol Huffman coding sbll requires 1- bit min per symbol We're forced to use an integral number of bits DicBonary- based (LZW and friends) Memorizes sequences, but no nobon of frequency StaBsBcal coding Use long, specific context for predicbon P(A The_United_State_of_) =? Blend knowledge using contexts of different lengths AdapBve: model updates and change as text seen Olen aler every leqer! 48
49 MacKay, D.J.C. Informa6on Theory, Inference, and Learning Algorithms. 49
50 Guess the phrase 50
51 A simple unigram model of English ' </s> <sp> a b c d e f g h i j k l m n o p q r s t u v w x y z
52 From text to a real number ArithmeBc coding Message represented by real interval in [0, 1) More precise interval = specifies more bits Example e.g. [ , ) = "it was the best of Bmes" Or any number in that interval, Alphabet = {aeiou!} Transmission = eaii! Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0) 52
53 Transmission = eaii! Bell, Cleary, WiKen. Text Compression. AEer seeing Range Symbol Probability Range [0.0, 1.0] a 0.2 [0.0, 0.2) e [0.2, 0.5) e 0.3 [0.2, 0.5) a [0.2, 0.26) i 0.1 [0.5, 0.6) i [0.23, 0.236) o 0.2 [0.6, 0.8) i [0.233, ) u 0.1 [0.8, 0.9)! [ , )! 0.1 [0.9, 1.0) Note: Encoder/decoder agree on symbol to terminate message, here we use! 53
54 Context modeling PredicBon by ParBal Match (PPM) P(next symbol) depends on previous symbol(s) Blending for dealing with 0- frequency problem Easy to build as adapbve Learn from the text as you go No trie to send as in Huffman coding 1984, but sbll compebbve for text Various variants, PPM- A/B/C/D/Z/* Implemented in 7- zip, open source packages, PPM for XML, PPM for executables, 54
55 PPM An alphabet A with q symbols Order = How many previous symbols to use 2 = condibon on 2 previous symbols, P(x n x n- 1, x n- 2 ) 1 = condibon on 1 previous symbol, P(x n x n- 1 ) 0 = condibon on current symbol, P(x n ) - 1 = uniform over the alphabet, P(x n ) = 1/q For a given order: Probability of next symbol based on counbng occurrences given prior context 55
56 Escape probabilibes Need weights to blend probabilibes Compute based on "escape" probability Allocate some mass in each order for when a lower- order model should make predicbon instead Method A: Add one to C 0 : count of characters seen in this context e o = 1 / (C 0 + 1) Method D: u = number of unique characters seen in this context e o = (u / 2) / C 0 56
57 Bell, Cleary, WiKen. Text Compression. 57
58 Lossless benchmarks year scheme bits /char 1967 ASCII Huffman LZ LZMW LZH move- to- front LZB gzip PPMC PPM Burrows- Wheeler BOA RK 1.89 Data using Calgary corpus 58
59 Compressing Moby Dick 1,191,463 mobydick.txt 485,561 mobydick.gz 485,790 mobydick.zip 398,480 mobydick.xz 371,394 mobydick.bz2 667,651 mobydick.huff 597,060 mobydick.lzw 814,578 mobydick.huff.lzw 592,830 mobydick.lzw.huff 682,400 mobydick.lzw.reset 541,261 mobydick.lzw14 521,700 mobydick.lzw15 503,050 mobydick.lzw16 501,193 mobydick.lzw16.huff 521,700 mobydick.lzw17 514,393 mobydick.lzw18 571,548 mobydick.lzw20 325,378 mobydick.ppmd 306,708 mobydick.zpaq 59
60 Summary DicBonary- based LZW and variants Memorizes sequences in data StaBsBcal coding Language model produces probabilibes Probability sequence defines point on [0, 1) Use arithmebc coding to convert to bits PredicBon by ParBal Match (PPM) 60
Lossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More information5.5 Data Compression
5.5 Data Compression basics run-length coding Huffman compression LZW compression Algorithms, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2002 2010 February 8, 2011 2:50:01 PM Data compression
More informationEntropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code
Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic
More information5.5 Data Compression
5.5 Data Compression basics run-length encoding Huffman compression LZW compression Algorithms in Java, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2009 January 26, 2010 8:42:01 AM Data compression
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu
More informationAlgorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression
5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N basics run-length coding Huffman compression LZW compression R O B E R T S E D G E W I C K K E V I N W A Y N E Algorithms, 4 th Edition Robert
More informationCS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77
CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the
More informationEE-575 INFORMATION THEORY - SEM 092
EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE https://algs4.cs.princeton.edu
More informationSimple variant of coding with a variable number of symbols and fixlength codewords.
Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length
More informationData Compression. Guest lecture, SGDS Fall 2011
Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,
More informationData Compression 신찬수
Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression
More informationMultimedia Systems. Part 20. Mahdi Vasighi
Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding
More informationSIGNAL COMPRESSION Lecture Lempel-Ziv Coding
SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel
More informationCompression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:
CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,
More informationData compression with Huffman and LZW
Data compression with Huffman and LZW André R. Brodtkorb, Andre.Brodtkorb@sintef.no Outline Data storage and compression Huffman: how it works and where it's used LZW: how it works and where it's used
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual
More informationLZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014
LZW Compression Ramana Kumar Kundella Indiana State University rkundella@sycamores.indstate.edu December 13, 2014 Abstract LZW is one of the well-known lossless compression methods. Since it has several
More informationTHE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS
THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman
More informationComparative Study of Dictionary based Compression Algorithms on Text Data
88 Comparative Study of Dictionary based Compression Algorithms on Text Data Amit Jain Kamaljit I. Lakhtaria Sir Padampat Singhania University, Udaipur (Raj.) 323601 India Abstract: With increasing amount
More informationCIS 121 Data Structures and Algorithms with Java Spring 2018
CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and
More informationITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding
ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications
More informationDavid Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.
David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression
More informationDistributed source coding
Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >
More informationEE67I Multimedia Communication Systems Lecture 4
EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.
More informationCompressing Data. Konstantin Tretyakov
Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical
More informationDictionary techniques
Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.
More informationA Comparative Study Of Text Compression Algorithms
International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
lgorithms ROBERT SEDGEWICK KEVIN WYNE 5.5 DT COMPRESSION lgorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu
More informationFundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding
Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
lgorithms OBET SEDGEWICK KEVIN WYNE 5.5 DT COMPESSION lgorithms F O U T H E D I T I O N introduction run-length coding Huffman compression LZW compression OBET SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu
More information7: Image Compression
7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics
More informationImage coding and compression
Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding
More informationData Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression
An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression
More informationChapter 7 Lossless Compression Algorithms
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7
More informationWIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION
WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college
More informationData Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey
Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source
More informationIMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I
IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I 1 Need For Compression 2D data sets are much larger than 1D. TV and movie data sets are effectively 3D (2-space, 1-time). Need Compression for
More informationLempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static
More informationStudy of LZ77 and LZ78 Data Compression Techniques
Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information
More informationError Resilient LZ 77 Data Compression
Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single
More informationAn Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,
More informationTEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY
S SENTHIL AND L ROBERT: TEXT COMPRESSION ALGORITHMS A COMPARATIVE STUDY DOI: 10.21917/ijct.2011.0062 TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY S. Senthil 1 and L. Robert 2 1 Department of Computer
More informationLossless Compression Algorithms
Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms
More informationFPGA based Data Compression using Dictionary based LZW Algorithm
FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,
More informationarxiv: v2 [cs.it] 15 Jan 2011
Improving PPM Algorithm Using Dictionaries Yichuan Hu Department of Electrical and Systems Engineering University of Pennsylvania Email: yichuan@seas.upenn.edu Jianzhong (Charlie) Zhang, Farooq Khan and
More informationECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University
ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html
More informationOverview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes
Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1 Data Compression Decreases
More informationS 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources
Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014
More informationCompression of Concatenated Web Pages Using XBW
Compression of Concatenated Web Pages Using XBW Radovan Šesták and Jan Lánský Charles University, Faculty of Mathematics and Physics, Department of Software Engineering Malostranské nám. 25, 118 00 Praha
More informationEngineering Mathematics II Lecture 16 Compression
010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &
More informationA study in compression algorithms
Master Thesis Computer Science Thesis no: MCS-004:7 January 005 A study in compression algorithms Mattias Håkansson Sjöstrand Department of Interaction and System Design School of Engineering Blekinge
More informationIntroduction to Data Compression
Introduction to Data Compression Guillaume Tochon guillaume.tochon@lrde.epita.fr LRDE, EPITA Guillaume Tochon (LRDE) CODO - Introduction 1 / 9 Data compression: whatizit? Guillaume Tochon (LRDE) CODO -
More informationAn On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland
An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,
More informationGrayscale true two-dimensional dictionary-based image compression
J. Vis. Commun. Image R. 18 (2007) 35 44 www.elsevier.com/locate/jvci Grayscale true two-dimensional dictionary-based image compression Nathanael J. Brittain, Mahmoud R. El-Sakka * Computer Science Department,
More informationText Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2
Text Compression Jayadev Misra The University of Texas at Austin July 1, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction To Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable
More informationDEFLATE COMPRESSION ALGORITHM
DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract
More information11 Dictionary-based compressors
11 Dictionary-based compressors 11.1 LZ77... 11-1 11.2 LZ78... 11-4 11.3 LZW... 11-6 11.4 On the optimality of compressors... 11-7 The methods based on a dictionary take a totally different approach to
More informationMinification techniques
Minification techniques We have already discussed scaling images Enlarging an image well relies solely on good interpolation. We cannot add information to an image. Nearest neighbour will always give horrible
More information5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression
5.5 ata ompression ata compression ompression reduces the size of a file: To save space when storing it. To save time when transmitting it. Most files have lots of redundancy. basics run-length coding
More informationRepetition 1st lecture
Repetition 1st lecture Human Senses in Relation to Technical Parameters Multimedia - what is it? Human senses (overview) Historical remarks Color models RGB Y, Cr, Cb Data rates Text, Graphic Picture,
More informationLossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen
Lossless compression B U B U B 2 U B A ϵ CSCI 47: Web Science Keith Vertanen C R D! Lossless compression Mo7va7on Overview Rules and limits of the game Things to exploit Run- length encoding (RLE) Exploit
More information11 Dictionary-based compressors
11 Dictionary-based compressors 11.1 LZ77... 11-1 11.2 LZ78... 11-4 11.3 LZW... 11-6 11.4 On the optimality of compressors... 11-7 The methods based on a dictionary take a totally different approach to
More informationGUJARAT TECHNOLOGICAL UNIVERSITY
GUJARAT TECHNOLOGICAL UNIVERSITY INFORMATION TECHNOLOGY DATA COMPRESSION AND DATA RETRIVAL SUBJECT CODE: 2161603 B.E. 6 th SEMESTER Type of course: Core Prerequisite: None Rationale: Data compression refers
More informationG64PMM - Lecture 3.2. Analogue vs Digital. Analogue Media. Graphics & Still Image Representation
G64PMM - Lecture 3.2 Graphics & Still Image Representation Analogue vs Digital Analogue information Continuously variable signal Physical phenomena Sound/light/temperature/position/pressure Waveform Electromagnetic
More informationMODELING DELTA ENCODING OF COMPRESSED FILES. and. and
International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University
More informationModeling Delta Encoding of Compressed Files
Modeling Delta Encoding of Compressed Files EXTENDED ABSTRACT S.T. Klein, T.C. Serebro, and D. Shapira 1 Dept of CS Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il 2 Dept of CS Bar Ilan University
More informationAn Asymmetric, Semi-adaptive Text Compression Algorithm
An Asymmetric, Semi-adaptive Text Compression Algorithm Harry Plantinga Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 planting@cs.pitt.edu Abstract A new heuristic for text
More informationENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2
ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 1 Outline Canonical Huffman code Huffman
More informationCOMPRESSION OF SMALL TEXT FILES
COMPRESSION OF SMALL TEXT FILES Jan Platoš, Václav Snášel Department of Computer Science VŠB Technical University of Ostrava, Czech Republic jan.platos.fei@vsb.cz, vaclav.snasel@vsb.cz Eyas El-Qawasmeh
More informationIMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression
IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image
More informationWelcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson
Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information
More informationLecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia
Lecture Coding Theory Source Coding Image and Video Compression Images: Wikipedia Entropy Coding: Unary Coding Golomb Coding Static Huffman Coding Adaptive Huffman Coding Arithmetic Coding Run Length Encoding
More informationIMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1
IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,
More informationOptimized Compression and Decompression Software
2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Optimized Compression and Decompression Software Mohd Shafaat Hussain, Manoj Yadav
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationYou can say that again! Text compression
Activity 3 You can say that again! Text compression Age group Early elementary and up. Abilities assumed Copying written text. Time 10 minutes or more. Size of group From individuals to the whole class.
More information7.5 Dictionary-based Coding
7.5 Dictionary-based Coding LZW uses fixed-length code words to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text LZW encoder and decoder
More informationThe Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods
The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods R. Nigel Horspool Dept. of Computer Science, University of Victoria P. O. Box 3055, Victoria, B.C., Canada V8W 3P6 E-mail address: nigelh@csr.uvic.ca
More informationThe Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression
The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression Yossi Matias Nasir Rajpoot Süleyman Cenk Ṣahinalp Abstract We report on the performance evaluation of greedy parsing with a
More informationOptimal Parsing. In Dictionary-Symbolwise. Compression Algorithms
Università degli Studi di Palermo Facoltà Di Scienze Matematiche Fisiche E Naturali Tesi Di Laurea In Scienze Dell Informazione Optimal Parsing In Dictionary-Symbolwise Compression Algorithms Il candidato
More informationUniversity of Waterloo CS240 Spring 2018 Help Session Problems
University of Waterloo CS240 Spring 2018 Help Session Problems Reminder: Final on Wednesday, August 1 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems
More informationModeling Delta Encoding of Compressed Files
Shmuel T. Klein 1, Tamar C. Serebro 1, and Dana Shapira 2 1 Department of Computer Science Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il, t lender@hotmail.com 2 Department of Computer Science
More informationA Hybrid Approach to Text Compression
A Hybrid Approach to Text Compression Peter C Gutmann Computer Science, University of Auckland, New Zealand Telephone +64 9 426-5097; email pgut 1 Bcs.aukuni.ac.nz Timothy C Bell Computer Science, University
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits
More informationASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.
Project 2 1 P2-0: Text Files All files are represented as binary digits including text files Each character is represented by an integer code ASCII American Standard Code for Information Interchange Text
More informationCompression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv
Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions
More informationMultimedia Networking ECE 599
Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics
More informationA SIMPLE LOSSLESS COMPRESSION ALGORITHM IN WIRELESS SENSOR NETWORKS: AN APPLICATION OF SEISMIC DATA
ISSN: 0976-3104 ARTICLE A SIMPLE LOSSLESS COMPRESSION ALGORITHM IN WIRELESS SENSOR NETWORKS: AN APPLICATION OF SEISMIC DATA J. Uthayakumar 1*, T. Vengattaraman 1, J. Amudhavel 2 1 Department of Computer
More informationGreedy Algorithms CHAPTER 16
CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often
More informationDigital Image Processing
Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?
More informationLempel-Ziv compression: how and why?
Lempel-Ziv compression: how and why? Algorithms on Strings Paweł Gawrychowski July 9, 2013 s July 9, 2013 2/18 Outline Lempel-Ziv compression Computing the factorization Using the factorization July 9,
More informationImage compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year
Image compression Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Methods for Image Processing academic year 2017 2018 Data and information The representation of images in a raw
More informationIndexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton
Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.
More informationCHAPTER II LITERATURE REVIEW
CHAPTER II LITERATURE REVIEW 2.1 BACKGROUND OF THE STUDY The purpose of this chapter is to study and analyze famous lossless data compression algorithm, called LZW. The main objective of the study is to
More information