Lossless compression II

Size: px
Start display at page:

Download "Lossless compression II"

Transcription

1 Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0) CSCI 470: Web Science Keith Vertanen

2 Overview Flavors of StaBc vs. Dynamic vs. AdapBve Lempel- Ziv- Welch (LZW) Fixed- length Variable- length paqern StaBsBcal coding ArithmeBc coding PredicBon by ParBal Match (PPM) 2

3 StaBc Compression models Predefined map for all text, e.g. ASCII, Morse Code Not opbmal: different texts = different stabsbcs Dynamic Generate model based on text Requires inibal pass before can start Must transmit the model (e.g. Huffman coding) AdapBve More accurate modeling = beqer Decoding must start from beginning (e.g. LZW) 3

4 LZW Lempel- Ziv- Welch (LZW) Basis for many popular formats e.g. PNG, 7zip, gzip, jar, PDF, compress, pkzip, GIF, TIFF Algorithm basics: Maintain a table of fixed- length s for variable- length paqerns in the input Built progressively by compressor and expander Table does not need to be transmiqed! Table is of fixed size Entries for all single characters in alphabet Entries for longer subs encountered 4

5 Details: LZW example 7- bit ASCII characters, 8- bit For real use, 8- bit ~12- bit Alphabet: 128 characters longer s Codeword: 2- digit hex value = single characters, e.g. 41 = A, 52 = R 80 = end of file 81- FF = longer s, e.g. 81 = AB, 88 = ABR Employs lookahead character to add s 5

6 input A B R A C A D A B R A B R A B R A matches Compressing D 44 Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c R 52 Ini6al set of character s 6

7 input A B R A C A D A B R A B R A B R A matches A 41 D 44 R 52 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 7

8 input A B R A C A D A B R A B R A B R A matches A B D 44 R 52 BR 82 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 8

9 input A B R A C A D A B R A B R A B R A matches A B R D 44 R 52 BR 82 RA 83 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 9

10 input A B R A C A D A B R A B R A B R A matches A B R A D 44 R 52 BR 82 RA 83 AC 84 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 10

11 input A B R A C A D A B R A B R A B R A matches A B R A C A D D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c Ini6al set of character s 11

12 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s 12

13 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 13

14 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 BRA 8A 14

15 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR ABR D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c ABR 88 Ini6al set of character s RAB 89 BRA 8A ABRA 8B 15

16 input A B R A C A D A B R A B R A B R A matches A B R A C A D AB RA BR ABR A D 44 BR 82 RA 83 AC 84 CA 85 AD 86 Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c R 52 Ini6al set of character s DA 87 ABR 88 RAB 89 BRA 8A ABRA 8B Input: 17 ASCII characters * 7 bits = 119 bits Output: 12 s * 8 bits = 96 bits Compression ratio: 82% 16

17 Compression data structure LZW uses two table operabons: Find longest- prefix match from current posibon Add entry with character added to longest match Trie data structure, "retrieval" Linked tree structure Path in tree defines Compressing Find longest s in table that is prefix of unscanned input Write of s Scan one character c ahead Associate next free with s + c 17

18 D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B AD 86 BR 82 DA 87 RA 83 ABR 88 AC 84 RAB 89 CA 85 BRA 8A 18

19 Java implementabon, LZW private static final int R = 256; private static final int L = 4096; private static final int W = 12; // number of input chars // number of s = 2^W // width public static void compress() { String input = BinaryStdIn.readString(); // read input as TST<Integer> st = new TST<Integer>(); // trie data structure for (int i = 0; i < R; i++) st.put("" + (char) i, i); int code = R+1; // s for single chars // R is for EOF } while (input.length() > 0) { String s = st.longestprefixof(input); BinaryStdOut.write(st.get(s), W); // write W- bit for s int t = s.length(); if (t < input.length() && code < L) st.put(input.sub(0, t + 1), code++); // add new input = input.sub(t); } BinaryStdOut.write(R, W); // write last BinaryStdOut.close(); 19

20 input output Expansion Write the current val Read x from input D 44 Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s R 52 Ini6al set of character s val old = A x = 42 s = B c = B next code = 81 val new = B 20

21 input output A D 44 R 52 Ini6al set of character s Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = B x = 52 s = R c = R next code = 82 val new = R 21

22 input output A B D 44 R 52 Ini6al set of character s BR 82 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = R x = 41 s = A c = A next code = 83 val new = A 22

23 input output A B R D 44 R 52 Ini6al set of character s BR 82 RA 83 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 43 s = C c = C next code = 84 val new = C 23

24 input output A B R A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = C x = 41 s = A c = A next code = 85 val new = A 24

25 input output A B R A C D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 44 s = D c = D next code = 86 val new = D 25

26 input output A B R A C A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = D x = 81 s = AB c = A next code = 87 val new = AB 26

27 input output A B R A C A D D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = AB x = 83 s = RA c = R next code = 88 val new = RA 27

28 input output A B R A C A D AB D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = RA x = 82 s = BR c = B next code = 89 val new = BR 28

29 input output A B R A C A D AB RA D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = BR x = 88 s = ABR c = A next code = 8A = RA val new 29

30 input output A B R A C A D AB RA BR D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA 8A Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = ABR x = 41 s = A c = A next code = 8B = RA val new 30

31 input output A B R A C A D AB RA BR ABR D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA ABRA 8A 8B Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = A x = 80 s = c = next code = val new = 31

32 input output A B R A C A D AB RA BR ABR A D 44 R 52 Ini6al set of character s BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 ABR 88 RAB 89 BRA ABRA 8A 8B Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s val old = x = s = c = next code = val new = 32

33 Expansion data structure For a W- bit, look up value An array of size 2 W D 44 R 52 BR 82 RA 83 AC 84 CA 85 AD 86 DA 87 Expansion Write the current val Read x from input Set s to for x Set next unassigned to val+c where c is 1 st char in s Set val=s ABR 88 RAB 89 BRA ABRA 8A 8B 33

34 A tricky case: ABABABA input A B A B A B A matches Ini6al set of s 34

35 A tricky case: ABABABA input A B A B A B A matches A 41 Ini6al set of s 35

36 A tricky case: ABABABA input A B A B A B A matches A B BA 82 Ini6al set of s 36

37 A tricky case: ABABABA input A B A B A B A matches A B AB BA 82 ABA 83 Ini6al set of s 37

38 A tricky case: ABABABA input A B A B A B A matches A B AB ABA BA 82 ABA 83 NoBce right aler adding entry for "AB" the subsequent text had "AB" as a prefix. Ini6al set of s 38

39 A tricky case: expansion ABABABA input output Ini6al set of s 39

40 A tricky case: expansion ABABABA input output A Ini6al set of s 40

41 A tricky case: expansion ABABABA input output A B BA 82 Ini6al set of s 41

42 A tricky case: expansion ABABABA input output A B AB BA 82 We need to know the first character of 83 in order to enter 83 into the table! Fix: First character of 83 must be same as first leqer of current 81, i.e. "A". Ini6al set of s 42

43 A tricky case: expansion ABABABA input output A B AB ABA BA 82 ABA 83 Ini6al set of s 43

44 Java implementabon, LZW expansion private static final int R = 256; private static final int L = 4096; private static final int W = 12; // number of input chars // number of s = 2^W // width public static void expand() { String[] st = new String[L]; int i; // next available value // initialize symbol table with all 1- character s for (i = 0; i < R; i++) st[i] = "" + (char) i; st[i++] = ""; // (unused) lookahead for EOF int = BinaryStdIn.readInt(W); String val = st[]; } while (true) { BinaryStdOut.write(val); = BinaryStdIn.readInt(W); if ( == R) break; String s = st[]; if (i == ) s = val + val.charat(0); if (i < L) st[i++] = val + s.charat(0); val = s; } BinaryStdOut.close(); // special case hack 44

45 LZW decisions How big of a symbol table? How long is message? Whole message similar model? Many variabons What to do when symbol table fills up? Throw away and start over (e.g. GIF) Throw away when not effecbve (e.g. Unix compress) Many variabons Put longer subs in symbol table? Many variabons 45

46 Lempel- Ziv variants LZW variants LZ77, published Lempel & Ziv, 1977, not patented PNG LZ78, published Lempel & Ziv in 1978, patented LZW, Welch extension to LZ78, patented (expired 2003) GIF, TIFF, Pkzip Deflate = LZ77 variant + Huffman coding 7zip, gzip, jar, PDF 46

47 Some experiments Compressing Moby Dick Using Java programs for LZW and Huffman Different dicbonary sizes for LZW or reseqng when full Compared to zip and gzip 1,191,463 mobydick.txt 485,561 mobydick.gz 485,790 mobydick.zip 398,480 mobydick.xz 371,394 mobydick.bz2 667,651 mobydick.huff 597,060 mobydick.lzw 814,578 mobydick.huff.lzw 592,830 mobydick.lzw.huff 682,400 mobydick.lzw.reset 541,261 mobydick.lzw14 521,700 mobydick.lzw15 503,050 mobydick.lzw16 501,193 mobydick.lzw16.huff 521,700 mobydick.lzw17 514,393 mobydick.lzw18 571,548 mobydick.lzw20 47

48 Enter stabsbcal coding Natural language quite predictable ~ 1 bit of entropy per symbol Huffman coding sbll requires 1- bit min per symbol We're forced to use an integral number of bits DicBonary- based (LZW and friends) Memorizes sequences, but no nobon of frequency StaBsBcal coding Use long, specific context for predicbon P(A The_United_State_of_) =? Blend knowledge using contexts of different lengths AdapBve: model updates and change as text seen Olen aler every leqer! 48

49 MacKay, D.J.C. Informa6on Theory, Inference, and Learning Algorithms. 49

50 Guess the phrase 50

51 A simple unigram model of English ' </s> <sp> a b c d e f g h i j k l m n o p q r s t u v w x y z

52 From text to a real number ArithmeBc coding Message represented by real interval in [0, 1) More precise interval = specifies more bits Example e.g. [ , ) = "it was the best of Bmes" Or any number in that interval, Alphabet = {aeiou!} Transmission = eaii! Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0) 52

53 Transmission = eaii! Bell, Cleary, WiKen. Text Compression. AEer seeing Range Symbol Probability Range [0.0, 1.0] a 0.2 [0.0, 0.2) e [0.2, 0.5) e 0.3 [0.2, 0.5) a [0.2, 0.26) i 0.1 [0.5, 0.6) i [0.23, 0.236) o 0.2 [0.6, 0.8) i [0.233, ) u 0.1 [0.8, 0.9)! [ , )! 0.1 [0.9, 1.0) Note: Encoder/decoder agree on symbol to terminate message, here we use! 53

54 Context modeling PredicBon by ParBal Match (PPM) P(next symbol) depends on previous symbol(s) Blending for dealing with 0- frequency problem Easy to build as adapbve Learn from the text as you go No trie to send as in Huffman coding 1984, but sbll compebbve for text Various variants, PPM- A/B/C/D/Z/* Implemented in 7- zip, open source packages, PPM for XML, PPM for executables, 54

55 PPM An alphabet A with q symbols Order = How many previous symbols to use 2 = condibon on 2 previous symbols, P(x n x n- 1, x n- 2 ) 1 = condibon on 1 previous symbol, P(x n x n- 1 ) 0 = condibon on current symbol, P(x n ) - 1 = uniform over the alphabet, P(x n ) = 1/q For a given order: Probability of next symbol based on counbng occurrences given prior context 55

56 Escape probabilibes Need weights to blend probabilibes Compute based on "escape" probability Allocate some mass in each order for when a lower- order model should make predicbon instead Method A: Add one to C 0 : count of characters seen in this context e o = 1 / (C 0 + 1) Method D: u = number of unique characters seen in this context e o = (u / 2) / C 0 56

57 Bell, Cleary, WiKen. Text Compression. 57

58 Lossless benchmarks year scheme bits /char 1967 ASCII Huffman LZ LZMW LZH move- to- front LZB gzip PPMC PPM Burrows- Wheeler BOA RK 1.89 Data using Calgary corpus 58

59 Compressing Moby Dick 1,191,463 mobydick.txt 485,561 mobydick.gz 485,790 mobydick.zip 398,480 mobydick.xz 371,394 mobydick.bz2 667,651 mobydick.huff 597,060 mobydick.lzw 814,578 mobydick.huff.lzw 592,830 mobydick.lzw.huff 682,400 mobydick.lzw.reset 541,261 mobydick.lzw14 521,700 mobydick.lzw15 503,050 mobydick.lzw16 501,193 mobydick.lzw16.huff 521,700 mobydick.lzw17 514,393 mobydick.lzw18 571,548 mobydick.lzw20 325,378 mobydick.ppmd 306,708 mobydick.zpaq 59

60 Summary DicBonary- based LZW and variants Memorizes sequences in data StaBsBcal coding Language model produces probabilibes Probability sequence defines point on [0, 1) Use arithmebc coding to convert to bits PredicBon by ParBal Match (PPM) 60

Lossless compression II

Lossless compression II Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)

More information

5.5 Data Compression

5.5 Data Compression 5.5 Data Compression basics run-length coding Huffman compression LZW compression Algorithms, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2002 2010 February 8, 2011 2:50:01 PM Data compression

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

5.5 Data Compression

5.5 Data Compression 5.5 Data Compression basics run-length encoding Huffman compression LZW compression Algorithms in Java, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2009 January 26, 2010 8:42:01 AM Data compression

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu

More information

Algorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression

Algorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N basics run-length coding Huffman compression LZW compression R O B E R T S E D G E W I C K K E V I N W A Y N E Algorithms, 4 th Edition Robert

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

EE-575 INFORMATION THEORY - SEM 092

EE-575 INFORMATION THEORY - SEM 092 EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE https://algs4.cs.princeton.edu

More information

Simple variant of coding with a variable number of symbols and fixlength codewords.

Simple variant of coding with a variable number of symbols and fixlength codewords. Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length

More information

Data Compression. Guest lecture, SGDS Fall 2011

Data Compression. Guest lecture, SGDS Fall 2011 Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,

More information

Data Compression 신찬수

Data Compression 신찬수 Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression

More information

Multimedia Systems. Part 20. Mahdi Vasighi

Multimedia Systems. Part 20. Mahdi Vasighi Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints: CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,

More information

Data compression with Huffman and LZW

Data compression with Huffman and LZW Data compression with Huffman and LZW André R. Brodtkorb, Andre.Brodtkorb@sintef.no Outline Data storage and compression Huffman: how it works and where it's used LZW: how it works and where it's used

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual

More information

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014 LZW Compression Ramana Kumar Kundella Indiana State University rkundella@sycamores.indstate.edu December 13, 2014 Abstract LZW is one of the well-known lossless compression methods. Since it has several

More information

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman

More information

Comparative Study of Dictionary based Compression Algorithms on Text Data

Comparative Study of Dictionary based Compression Algorithms on Text Data 88 Comparative Study of Dictionary based Compression Algorithms on Text Data Amit Jain Kamaljit I. Lakhtaria Sir Padampat Singhania University, Udaipur (Raj.) 323601 India Abstract: With increasing amount

More information

CIS 121 Data Structures and Algorithms with Java Spring 2018

CIS 121 Data Structures and Algorithms with Java Spring 2018 CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and

More information

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications

More information

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc. David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression

More information

Distributed source coding

Distributed source coding Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >

More information

EE67I Multimedia Communication Systems Lecture 4

EE67I Multimedia Communication Systems Lecture 4 EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.

More information

Compressing Data. Konstantin Tretyakov

Compressing Data. Konstantin Tretyakov Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical

More information

Dictionary techniques

Dictionary techniques Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.

More information

A Comparative Study Of Text Compression Algorithms

A Comparative Study Of Text Compression Algorithms International Journal of Wisdom Based Computing, Vol. 1 (3), December 2011 68 A Comparative Study Of Text Compression Algorithms Senthil Shanmugasundaram Department of Computer Science, Vidyasagar College

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE lgorithms ROBERT SEDGEWICK KEVIN WYNE 5.5 DT COMPRESSION lgorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu

More information

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression

More information

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE lgorithms OBET SEDGEWICK KEVIN WYNE 5.5 DT COMPESSION lgorithms F O U T H E D I T I O N introduction run-length coding Huffman compression LZW compression OBET SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu

More information

7: Image Compression

7: Image Compression 7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics

More information

Image coding and compression

Image coding and compression Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding

More information

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression

More information

Chapter 7 Lossless Compression Algorithms

Chapter 7 Lossless Compression Algorithms Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7

More information

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION V.KRISHNAN1, MR. R.TRINADH 2 1 M. Tech Student, 2 M. Tech., Assistant Professor, Dept. Of E.C.E, SIR C.R. Reddy college

More information

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source

More information

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I 1 Need For Compression 2D data sets are much larger than 1D. TV and movie data sets are effectively 3D (2-space, 1-time). Need Compression for

More information

Lempel-Ziv-Welch (LZW) Compression Algorithm

Lempel-Ziv-Welch (LZW) Compression Algorithm Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static

More information

Study of LZ77 and LZ78 Data Compression Techniques

Study of LZ77 and LZ78 Data Compression Techniques Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information

More information

Ch. 2: Compression Basics Multimedia Systems

Ch. 2: Compression Basics Multimedia Systems Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information

More information

Error Resilient LZ 77 Data Compression

Error Resilient LZ 77 Data Compression Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single

More information

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY S SENTHIL AND L ROBERT: TEXT COMPRESSION ALGORITHMS A COMPARATIVE STUDY DOI: 10.21917/ijct.2011.0062 TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY S. Senthil 1 and L. Robert 2 1 Department of Computer

More information

Lossless Compression Algorithms

Lossless Compression Algorithms Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms

More information

FPGA based Data Compression using Dictionary based LZW Algorithm

FPGA based Data Compression using Dictionary based LZW Algorithm FPGA based Data Compression using Dictionary based LZW Algorithm Samish Kamble PG Student, E & TC Department, D.Y. Patil College of Engineering, Kolhapur, India Prof. S B Patil Asso.Professor, E & TC Department,

More information

arxiv: v2 [cs.it] 15 Jan 2011

arxiv: v2 [cs.it] 15 Jan 2011 Improving PPM Algorithm Using Dictionaries Yichuan Hu Department of Electrical and Systems Engineering University of Pennsylvania Email: yichuan@seas.upenn.edu Jianzhong (Charlie) Zhang, Farooq Khan and

More information

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html

More information

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1 Data Compression Decreases

More information

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014

More information

Compression of Concatenated Web Pages Using XBW

Compression of Concatenated Web Pages Using XBW Compression of Concatenated Web Pages Using XBW Radovan Šesták and Jan Lánský Charles University, Faculty of Mathematics and Physics, Department of Software Engineering Malostranské nám. 25, 118 00 Praha

More information

Engineering Mathematics II Lecture 16 Compression

Engineering Mathematics II Lecture 16 Compression 010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &

More information

A study in compression algorithms

A study in compression algorithms Master Thesis Computer Science Thesis no: MCS-004:7 January 005 A study in compression algorithms Mattias Håkansson Sjöstrand Department of Interaction and System Design School of Engineering Blekinge

More information

Introduction to Data Compression

Introduction to Data Compression Introduction to Data Compression Guillaume Tochon guillaume.tochon@lrde.epita.fr LRDE, EPITA Guillaume Tochon (LRDE) CODO - Introduction 1 / 9 Data compression: whatizit? Guillaume Tochon (LRDE) CODO -

More information

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,

More information

Grayscale true two-dimensional dictionary-based image compression

Grayscale true two-dimensional dictionary-based image compression J. Vis. Commun. Image R. 18 (2007) 35 44 www.elsevier.com/locate/jvci Grayscale true two-dimensional dictionary-based image compression Nathanael J. Brittain, Mahmoud R. El-Sakka * Computer Science Department,

More information

Text Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2

Text Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2 Text Compression Jayadev Misra The University of Texas at Austin July 1, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction To Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

More information

DEFLATE COMPRESSION ALGORITHM

DEFLATE COMPRESSION ALGORITHM DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract

More information

11 Dictionary-based compressors

11 Dictionary-based compressors 11 Dictionary-based compressors 11.1 LZ77... 11-1 11.2 LZ78... 11-4 11.3 LZW... 11-6 11.4 On the optimality of compressors... 11-7 The methods based on a dictionary take a totally different approach to

More information

Minification techniques

Minification techniques Minification techniques We have already discussed scaling images Enlarging an image well relies solely on good interpolation. We cannot add information to an image. Nearest neighbour will always give horrible

More information

5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression

5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression 5.5 ata ompression ata compression ompression reduces the size of a file: To save space when storing it. To save time when transmitting it. Most files have lots of redundancy. basics run-length coding

More information

Repetition 1st lecture

Repetition 1st lecture Repetition 1st lecture Human Senses in Relation to Technical Parameters Multimedia - what is it? Human senses (overview) Historical remarks Color models RGB Y, Cr, Cb Data rates Text, Graphic Picture,

More information

Lossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen

Lossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen Lossless compression B U B U B 2 U B A ϵ CSCI 47: Web Science Keith Vertanen C R D! Lossless compression Mo7va7on Overview Rules and limits of the game Things to exploit Run- length encoding (RLE) Exploit

More information

11 Dictionary-based compressors

11 Dictionary-based compressors 11 Dictionary-based compressors 11.1 LZ77... 11-1 11.2 LZ78... 11-4 11.3 LZW... 11-6 11.4 On the optimality of compressors... 11-7 The methods based on a dictionary take a totally different approach to

More information

GUJARAT TECHNOLOGICAL UNIVERSITY

GUJARAT TECHNOLOGICAL UNIVERSITY GUJARAT TECHNOLOGICAL UNIVERSITY INFORMATION TECHNOLOGY DATA COMPRESSION AND DATA RETRIVAL SUBJECT CODE: 2161603 B.E. 6 th SEMESTER Type of course: Core Prerequisite: None Rationale: Data compression refers

More information

G64PMM - Lecture 3.2. Analogue vs Digital. Analogue Media. Graphics & Still Image Representation

G64PMM - Lecture 3.2. Analogue vs Digital. Analogue Media. Graphics & Still Image Representation G64PMM - Lecture 3.2 Graphics & Still Image Representation Analogue vs Digital Analogue information Continuously variable signal Physical phenomena Sound/light/temperature/position/pressure Waveform Electromagnetic

More information

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and

MODELING DELTA ENCODING OF COMPRESSED FILES. and. and International Journal of Foundations of Computer Science c World Scientific Publishing Company MODELING DELTA ENCODING OF COMPRESSED FILES SHMUEL T. KLEIN Department of Computer Science, Bar-Ilan University

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Modeling Delta Encoding of Compressed Files EXTENDED ABSTRACT S.T. Klein, T.C. Serebro, and D. Shapira 1 Dept of CS Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il 2 Dept of CS Bar Ilan University

More information

An Asymmetric, Semi-adaptive Text Compression Algorithm

An Asymmetric, Semi-adaptive Text Compression Algorithm An Asymmetric, Semi-adaptive Text Compression Algorithm Harry Plantinga Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 planting@cs.pitt.edu Abstract A new heuristic for text

More information

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2 ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 1 Outline Canonical Huffman code Huffman

More information

COMPRESSION OF SMALL TEXT FILES

COMPRESSION OF SMALL TEXT FILES COMPRESSION OF SMALL TEXT FILES Jan Platoš, Václav Snášel Department of Computer Science VŠB Technical University of Ostrava, Czech Republic jan.platos.fei@vsb.cz, vaclav.snasel@vsb.cz Eyas El-Qawasmeh

More information

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression IMAGE COMPRESSION Image Compression Why? Reducing transportation times Reducing file size A two way event - compression and decompression 1 Compression categories Compression = Image coding Still-image

More information

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information

More information

Lecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia

Lecture Coding Theory. Source Coding. Image and Video Compression. Images: Wikipedia Lecture Coding Theory Source Coding Image and Video Compression Images: Wikipedia Entropy Coding: Unary Coding Golomb Coding Static Huffman Coding Adaptive Huffman Coding Arithmetic Coding Run Length Encoding

More information

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1 IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,

More information

Optimized Compression and Decompression Software

Optimized Compression and Decompression Software 2015 IJSRSET Volume 1 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Optimized Compression and Decompression Software Mohd Shafaat Hussain, Manoj Yadav

More information

Compression; Error detection & correction

Compression; Error detection & correction Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some

More information

You can say that again! Text compression

You can say that again! Text compression Activity 3 You can say that again! Text compression Age group Early elementary and up. Abilities assumed Copying written text. Time 10 minutes or more. Size of group From individuals to the whole class.

More information

7.5 Dictionary-based Coding

7.5 Dictionary-based Coding 7.5 Dictionary-based Coding LZW uses fixed-length code words to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text LZW encoder and decoder

More information

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods

The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods The Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods R. Nigel Horspool Dept. of Computer Science, University of Victoria P. O. Box 3055, Victoria, B.C., Canada V8W 3P6 E-mail address: nigelh@csr.uvic.ca

More information

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression

The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression The Effect of Flexible Parsing for Dynamic Dictionary Based Data Compression Yossi Matias Nasir Rajpoot Süleyman Cenk Ṣahinalp Abstract We report on the performance evaluation of greedy parsing with a

More information

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms

Optimal Parsing. In Dictionary-Symbolwise. Compression Algorithms Università degli Studi di Palermo Facoltà Di Scienze Matematiche Fisiche E Naturali Tesi Di Laurea In Scienze Dell Informazione Optimal Parsing In Dictionary-Symbolwise Compression Algorithms Il candidato

More information

University of Waterloo CS240 Spring 2018 Help Session Problems

University of Waterloo CS240 Spring 2018 Help Session Problems University of Waterloo CS240 Spring 2018 Help Session Problems Reminder: Final on Wednesday, August 1 2018 Note: This is a sample of problems designed to help prepare for the final exam. These problems

More information

Modeling Delta Encoding of Compressed Files

Modeling Delta Encoding of Compressed Files Shmuel T. Klein 1, Tamar C. Serebro 1, and Dana Shapira 2 1 Department of Computer Science Bar Ilan University Ramat Gan, Israel tomi@cs.biu.ac.il, t lender@hotmail.com 2 Department of Computer Science

More information

A Hybrid Approach to Text Compression

A Hybrid Approach to Text Compression A Hybrid Approach to Text Compression Peter C Gutmann Computer Science, University of Auckland, New Zealand Telephone +64 9 426-5097; email pgut 1 Bcs.aukuni.ac.nz Timothy C Bell Computer Science, University

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits

More information

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character. Project 2 1 P2-0: Text Files All files are represented as binary digits including text files Each character is represented by an integer code ASCII American Standard Code for Information Interchange Text

More information

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv Compression Outline 15-853:Algorithms in the Rel World Dt Compression III Introduction: Lossy vs. Lossless, Benchmrks, Informtion Theory: Entropy, etc. Proility Coding: Huffmn + Arithmetic Coding Applictions

More information

Multimedia Networking ECE 599

Multimedia Networking ECE 599 Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics

More information

A SIMPLE LOSSLESS COMPRESSION ALGORITHM IN WIRELESS SENSOR NETWORKS: AN APPLICATION OF SEISMIC DATA

A SIMPLE LOSSLESS COMPRESSION ALGORITHM IN WIRELESS SENSOR NETWORKS: AN APPLICATION OF SEISMIC DATA ISSN: 0976-3104 ARTICLE A SIMPLE LOSSLESS COMPRESSION ALGORITHM IN WIRELESS SENSOR NETWORKS: AN APPLICATION OF SEISMIC DATA J. Uthayakumar 1*, T. Vengattaraman 1, J. Amudhavel 2 1 Department of Computer

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Digital Image Processing

Digital Image Processing Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived

More information

CSE 421 Greedy: Huffman Codes

CSE 421 Greedy: Huffman Codes CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits a 45% b 13% c 12% d 16% e 9% f 5% Why?

More information

Lempel-Ziv compression: how and why?

Lempel-Ziv compression: how and why? Lempel-Ziv compression: how and why? Algorithms on Strings Paweł Gawrychowski July 9, 2013 s July 9, 2013 2/18 Outline Lempel-Ziv compression Computing the factorization Using the factorization July 9,

More information

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year Image compression Stefano Ferrari Università degli Studi di Milano stefano.ferrari@unimi.it Methods for Image Processing academic year 2017 2018 Data and information The representation of images in a raw

More information

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton Indexing Index Construction CS6200: Information Retrieval Slides by: Jesse Anderton Motivation: Scale Corpus Terms Docs Entries A term incidence matrix with V terms and D documents has O(V x D) entries.

More information

CHAPTER II LITERATURE REVIEW

CHAPTER II LITERATURE REVIEW CHAPTER II LITERATURE REVIEW 2.1 BACKGROUND OF THE STUDY The purpose of this chapter is to study and analyze famous lossless data compression algorithm, called LZW. The main objective of the study is to

More information