introduction run-length coding Huffman compression Applications
|
|
- Brooke Lang
- 5 years ago
- Views:
Transcription
1 lgorithms lgorithms F O U T H E I T I O N OET S EGEWIK K EVIN W YNE. T OMPESSION. T OMPESSION introduction run-length coding introduction run-length coding Huffman compression lgorithms LZW compression OET S EGEWIK K EVIN W YNE OET S EGEWIK K EVIN W YNE Huffman compression LZW compression Last updated on 4//6 :4 PM ata compression pplications ompression reduces the size of a file: Generic file compression (always lossless). To save space when storing it. To save time when transmitting it. Most files have lots of redundancy. Files: GZIP, ZIP, 7z. rchivers: PKZIP. File systems: NTFS, ZFS, HFS+, efs, GFS. Multimedia (usually lossy). Images: GIF, JPEG. Sound: MP. Video: MPEG, ivx, HTV. Everyday, we create. quintillion bytes of data so much that 9% of the data in the world today has been created in the last two years alone. IM report on big data () ommunication. ITU-T T4 Group Fax. V.4bis modem. Skype, Google hangout. atabases. Google, Facebook, NS,... 4
2 Lossless compression and expansion ompression before computers Message. itstream we want to compress. ompress. Generates a "compressed" representation (). Expand. econstructs original bitstream. bitstream... ompress compressed version ()... asic model for data compression ompression ratio. its in () / bits in. Expand Ex. 7% or better compression ratio for natural language. uses fewer bits (you hope) original bitstream... ata compression has been omnipresent since antiquity: Number systems. X Natural languages. n= Mathematical notation. It played a central role in communications technology: Grade raille. Morse code. Telephone system. n = 6 b r a i l l e but rather a I like like every 6 ata representation: genomic code Genome. String over the alphabet {, T,, G. Goal. Encode an N-character genome: TGTGTG... Standard SII encoding. 8 bits per char. 8 N bits. char hex binary '' 4 'T' 4 '' 4 'G' 47 Two-bit encoding. bits per char. N bits (% compression ratio). char binary '' 'T' '' 'G' Fixed-length code. k-bit code supports alphabet of size k. eading and writing binary data inary standard input. ead bits from standard input. public class inarystdin boolean readoolean() read bit of data and return as a boolean char readhar() read 8 bits of data and return as a char char readhar(int r) read r bits of data and return as a char [similar methods for byte (8 bits); short (6 bits); int ( bits); long and double (64 bits)] boolean isempty() is the bitstream empty? void close() close the bitstream inary standard output. Write bits to standard output public class inarystdout void write(boolean b) write the specified bit void write(char c) write the specified 8-bit char void write(char c, int r) write the r least significant bits of the specified char [similar methods for byte (8 bits); short (6 bits); int ( bits); long and double (64 bits)] void close() close the bitstream 7 8
3 Writing binary data ate representation. Three different ways to represent //999. character stream (StdOut) StdOut.print(month + "/" + day + "/" + year); / / Three ints (inarystdout) inarystdout.write(month); inarystdout.write(day); inarystdout.write(year); 4-bit field, a -bit field, and a -bit field (inarystdout) inarystdout.write(month, 4); inarystdout.write(day, ); inarystdout.write(year, ); 999 bits ( + bits for byte alignment at close) 8 bits bits use. inarystdin allows if (cnt % width == ) StdOut.println(); us to avoid such system dependencies by writing our if (inarystdin.readoolean()) StdOut.print(""); else StdOut.print(""); own programs to convert bitstreams such that we can StdOut.println(cnt + " bits"); see them with our standard tools. For example, the program inaryump at left is a inarystdin client that Printing a bitstream on standard (character) output prints out the bits from standard input, encoded with the characters and. This program is useful for debugging when working with small inputs. We use a slightly more complicated version that inary just dumps prints the count when the width argument is (see Exercise..X). The similar client Hexump groups the data into 8-bit bytes and prints each as two hexadecimal digits that each represent 4 bits. The client Pictureump displays the bits in a Picture. Q. How You to can examine download Hexump the contents and Pictureump of a bitstream? from the booksite. Typically, we use piping and redirection at the command-line level when working with binary files: we can pipe the output of an encoder to inaryump, Hexump, or Pictureump, or redirect it to a file. Standard character stream % more abra.txt itstream represented as and characters % java inaryump 6 < abra.txt 96 bits Four ways to look at a bitstream itstream itstream represented represented with with hex hex digits digits % java java Hexump Hexump 4 < abra.txt abra.txt bytes bytes E F NUL SOH STX ETX EOT ENQ K EL S HT LF VT FF SO SI LE 4 NK SYN ET N EM SU ES FS GS S US SP # $ % & ( ) * +, -. / : ; < = >? E F G H I J K L M N O P Q S T U V W X Y Z [ \ ] ^ _ 6 ` a b c d e f g h i j k l m n o itstream represented as pixels in a Picture 7 p q r s t u v w x y z { ~ EL % java Pictureump 6 6 < abra.txt 96 bits 6-by-6 pixel window, magnified 9 Hexadecimal to SII conversion table Which of these formats are text-based, and which are binary? HTML GIF Universal data compression ZeoSync. nnounced : lossless compression of random data using Zero Space Tuner and inaryccelerator technology. MPEG PF SVG Java source code Java bytecode ZeoSync corporation folds after issuing $4 million in private stock
4 Quotes from this interview Universal data compression Wired News: When did you start working on this technology? Peter St. George: I started developing the technology about a dozen years ago. I worked on this one problem for years consecutively. This is a project that I dedicated my life to a dozen years ago. WN: Let's go into the details. Tell me how it works. It can compress random data? PSG: If you say absolutely random, it's going to be very hard to agree what absolutely random is. WN: How do you get around the conventional wisdom that says simple mathematics says it's impossible? PSG: We plan to attack that issue head on. What hasn't been previously proven, we're proving. I have one quote I'd like to share with you: "The person who says it cannot be done should not interrupt the person doing it." Proposition. No algorithm can compress every bitstring. Pf. [by contradiction] Suppose you have a universal data compression algorithm U that can compress every bitstream. Given bitstring, compress it to get smaller bitstring. ompress to get a smaller bitstring. ontinue until reaching bitstring of size. Implication: all bitstrings can be compressed to bits Pf. [by counting] Suppose your algorithm that can compress all,-bit strings. possible bitstrings with, bits. Only can be encoded with 999 bits. Similarly, only in 499 bitstrings can be encoded with bits Universal data compression? 4 U U U... U U U an you compress this string of decimal digits? Undecidability It s the first digits of pi after the decimal point. (ut how to compress?) % java andomits java Pictureump bits difficult file to compress: one million (pseudo-) random bits public class andomits { public static void main(string[] args) { int x = ; for (int i = ; i < ; i++) { x = x * ; inarystdout.write(x > ); inarystdout.close(); 6
5 denudcany in Enlgsih lnagugae ata compression: quiz Q. How much redundancy in the English language?. Quite a bit.... randomising letters in the middle of words [has] little or no effect on the ability of skilled readers to understand the text. This is easy to denmtrasote. In a pubiltacion of New Scnieitst you could ramdinose all the letetrs, keipeng the first two and last two the same, and reibadailty would hadrly be aftcfeed. My ansaylis did not come to much beucase the thoery at the time was for shape and senqeuce retigcionon. Saberi's work sugsegts we may have some pofrweul palrlael prsooscers at work. The resaon for this is suerly that idnetiyfing coentnt by paarllel prseocsing speeds up regnicoiton. We only need the first and last two letetrs to spot chganes in meniang. Graham awlinson The gaol of data cmperisoson is to inetdify rdenudcany and epxloit it. side. esign an algorithm to correct text with letters permuted. ank these in the order of compressibility:. n SII text file of Shakespeare s works. bitmap image of this slide. n mp file of Justin ieber s aby. > >. > >. > >. > > E. I don't know. 7 8 ompression still active area of research, big improvements possible. T OMPESSION lgorithms introduction run-length coding Huffman compression LZW compression OET SEGEWIK KEVIN WYNE 9
6 un-length encoding ata compression: quiz Simple type of redundancy in a bitstream. Long runs of repeated bits. epresentation. 4-bit counts to represent alternating runs of s and s: s, then 7 s, then 7 s, then s. 6 bits (instead of 4) 7 7 Q. How many bits to store the counts?. We typically use 8 (but 4 in the example above for brevity). Q. What to do when run length exceeds max count?. Intersperse runs of length. pplications. JPEG, ITU-T T4 Group Fax,... 4 bits What is the best compression ratio achievable from run-length coding when using 8-bit counts?. / 6. / 6. 8 /. 4 / = 4 / 8 E. I don't know. Variable-length codes lgorithms OET SEGEWIK KEVIN WYNE T OMPESSION introduction run-length coding Huffman compression LZW compression Use different number of bits to encode different chars. ssign shorter codes to more common chars. Ex. Morse code: Issue. mbiguity. SOS? V7? IMIE? EEWNI? In practice. Use a medium gap to separate codewords. codeword for S is a prefix of codeword for V avid Huffman 4
7 Variable-length codes Prefix-free codes: trie representation Q. How do we avoid ambiguity?. Ensure that no codeword is a prefix of another. Ex. Fixed-length code. Ex. ppend special stop character to each codeword. Ex. General prefix-free code. Q. How to represent the prefix-free code?. binary trie haracters in leaves. odeword is path from root to leaf. odeword table ompressed bitstring bits odeword table odeword table odeword table odeword table ompressed bitstring bits ompressed bitstring 9 bits ompressed bitstring bits ompressed bitstring 9 bits Two prefix-free codes odeword table Two prefix-free codes 6 Prefix-free codes: expansion Expansion. Start at root. Go left if bit is ; go right if. odeword table If leaf node, write character; return to root node; repeat. Q. Why would this fail if the code isn t prefix-free?. Internal nodes also have chars, but decompressor will never output them. Prefix-free codes: compression ompressed bitstring 9 bits ompression: Two create prefix-free ST codesof - pairs. odeword table ompressed bitstring bits ompressed bitstring bits odeword table odeword table odeword table odeword table ompressed bitstring bits ompressed bitstring 9 bits ompressed bitstring bits ompressed bitstring 9 bits odeword table Two prefix-free codes 7 odeword table Two prefix-free codes 8 ompressed bitstring 9 bits ompressed bitstring 9 bits
8 ata compression: quiz Huffman coding overview onsider the following trie representation of a prefix-free code. Expand the compressed bitstring.. PEE. PESEY E. SPE. SPEEY E. I don't know. S P Y Static model. Use the same prefix-free code for all messages. ynamic model. Use a custom prefix-free code for each message. ompression. ead message. uild best prefix-free code for message. How? [ahead] Write prefix-free code (as a trie). ompress message using prefix-free code. Expansion. ead prefix-free code (as a trie) from file. ead compressed message and expand using trie. 9 Prefix-free codes: how to transmit Q. How to write the trie?. Write preorder traversal of trie; mark leaf and internal nodes with a bit. leaves preorder traversal 4 4 Using preorder traversal to encode a trie as a bitstream internal nodes Note. If message is long, overhead of transmitting trie is small. Prefix-free codes: how to transmit Q. How to write the trie?. Write preorder traversal of trie; mark leaf and internal nodes with a bit. leaves preorder traversal 4 4 Using preorder traversal to encode a trie as a bitstream internal nodes private static void writetrie(node x) { if (x.isleaf()) { inarystdout.write(true); inarystdout.write(???); return; inarystdout.write(false); writetrie(???); writetrie(???); private static class Node implements omparable<node> { private final char ch; // used only for leaf nodes private final int freq; // used only by compress() private final Node left, right;
9 Prefix-free codes: how to transmit Prefix-free codes: how to transmit Q. How to write the trie?. Write preorder traversal of trie; mark leaf and internal nodes with a bit. Q. How to read in the trie?. econstruct from preorder traversal of trie. leaves preorder traversal 4 4 Using preorder traversal to encode a trie as a bitstream internal nodes private static void writetrie(node x) { if (x.isleaf()) { inarystdout.write(true); inarystdout.write(x.ch, 8); return; inarystdout.write(false); writetrie(x.left); writetrie(x.right); private static class Node implements omparable<node> { private final char ch; // used only for leaf nodes private final int freq; // used only by compress() private final Node left, right; leaves preorder traversal 4 4 Using preorder traversal to encode a trie as a bitstream internal nodes private static Node readtrie() { if (inarystdin.readoolean()) { char c = inarystdin.readhar(8); return new Node(c,, null, null); Node x = readtrie(); Node y = readtrie(); return new Node('\',, x, y); arbitrary ( not used with internal nodes) 4 Huffman codes Q. How to find best prefix-free code? Huffman algorithm: ount frequency freq[i] for each char i in input. Start with one node corresponding to each char i (with weight freq[i]). epeat until single trie formed: select two tries with min weight freq[i] and freq[j] merge into single trie with weight freq[i] + freq[j] pplications: ount frequency for each character in input. input
10 ount frequency for each character in input. Start with one node corresponding to each character with weight equal to frequency. input Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight.
11 Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight.
12 Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight.
13 Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight. 4 4 Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight
14 Select two tries with min weight. Merge into single trie with cumulative weight. Select two tries with min weight. Merge into single trie with cumulative weight Select two tries with min weight. Merge into single trie with cumulative weight. 7
15 onstructing a Huffman encoding trie: Java implementation Practice private static Node buildtrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char i = ; i < ; i++) if (freq[i] > ) pq.insert(new Node(i, freq[i], null, null)); while (pq.size() > ) { Node x = pq.delmin(); Node y = pq.delmin(); Node parent = new Node('\', x.freq + y.freq, x, y); pq.insert(parent); return pq.delmin(); not used for internal nodes total frequency two subtries initialize PQ with singleton tries merge two smallest tries onstruct the Huffman code for the following strings: aababcabcdabcde abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd 7 8 Practice onstruct the Huffman code for the following strings: aababcabcdabcde a b c d e abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd a b c d Each codeword uses bits, so no compression (or expansion) of input. Small overhead due to need to store trie. Huffman coding: overview ompression: high-level steps: uild prefix-free code for message: Tabulate character frequencies. ecursively merge two min weight tries. Write prefix-free code (as a trie). ompress message using prefix-free code: uild symbol table from characters to codewords. Output codeword for each character in input. Expansion: high-level steps: ead and decode prefix-free code (as a trie) from file. Expand compressed message using trie: epeatedly find path from root to leaf in trie using bit sequence. 9 6
16 Huffman compression summary Lossy vs. lossless compression Proposition. Huffman's algorithm produces an optimal prefix-free code. Pf. See textbook. Two-pass implementation (for compression). Pass : tabulate character frequencies; build trie. Pass : encode file by traversing trie (or symbol table). unning time (for compression). Using a binary heap N + log. unning time (for expansion). Using a binary trie N. Q. an we do better? [stay tuned] no prefix-free code uses fewer bits input size alphabet size This lecture: lossless compression Images, music, videos, : lossy compression dramatically more effective 6 6 Statistical methods lgorithms OET SEGEWIK KEVIN WYNE T OMPESSION introduction run-length coding Huffman compression LZW compression Static model. Same model for all texts. Fast. Not optimal: different texts have different statistical properties. Ex: SII, Morse code. ynamic model. Generate model based on text. Preliminary pass needed to generate model. Must transmit the model. Ex: Huffman code. daptive model. Progressively learn and update model as you read text. More accurate modeling produces better compression. ecoding must start from beginning. Ex: LZW. braham Lempel Jacob Ziv 64
17 LZW compression demo Lempel-Ziv-Welch compression input matches LZW compression for LZW compression. reate ST mapping string s to W-bit codewords. Initialize ST with codewords for single-character s. Find longest string s in ST that is a prefix of unscanned part of input. Write the W-bit codeword associated with s. dd s + c to ST, where c is next character in the input. Q. How to represent LZW compression code table?. trie to support longest prefix match. longest prefix match codeword table 8 stop char: LZW expansion demo LZW expansion output LZW expansion for LZW expansion. reate ST mapping W-bit s to string s. Initialize ST to contain single-character s. ead a W-bit. Find associated string in ST and write it out. Update ST. Q. How to represent LZW expansion code table?. n array of length W codeword table 67 68
18 ata compression: quiz 4 LZW tricky case: compression What is the LZW compression of? input matches LZW compression for E. I don't know codeword table 7 LZW tricky case: expansion LZW implementation details output x LZW expansion for x? need to know code for 8 before it is in codeword table we can deduce that the code for 8 is x for some character x now, we have deduced x How big to make ST? How long is message? Whole message similar model? [many other variations] What to do when ST fills up? Throw away and start over. [GIF] Throw away when not effective. [Unix compress] [many other variations] Why not put longer substrings in ST? [many variations have been developed] codeword table 7 7
19 LZW in the real world Lossless data compression benchmarks Lempel-Ziv and friends. LZ77. LZ78. LZW. eflate / zlib = LZ77 variant + Huffman. Unix compress, GIF, TIFF, V.4bis modem: LZW. zip, 7zip, gzip, jar, png, pdf: deflate / zlib. iphone, Wii, pache HTTP server: deflate / zlib. previously under patent not patented (widely used in open source) year scheme bits / char 967 SII 7 9 Huffman LZ LZMW. 987 LZH. 987 move-to-front LZ gzip PPM SK PPM.4 99 urrows-wheeler.9 next programming assignment 997 O K.89 7 data compression using algary corpus 74 ata compression summary Lossless compression. epresent fixed-length symbols with variable-length codes. [Huffman] epresent variable-length symbols with fixed-length codes. [LZW] Lossy compression. [not covered in this course] JPEG, MPEG, MP, FFT/T, wavelets, fractals, Theoretical limits on compression. Shannon entropy: H(X) = Practical compression. Exploit extra knowledge whenever possible. nx p(x i)lgp(x i) i 7
5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression
5.5 ata ompression ata compression ompression reduces the size of a file: To save space when storing it. To save time when transmitting it. Most files have lots of redundancy. basics run-length coding
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
lgorithms ROBERT SEDGEWICK KEVIN WYNE 5.5 DT COMPRESSION lgorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
lgorithms OBET SEDGEWICK KEVIN WYNE 5.5 DT COMPESSION lgorithms F O U T H E D I T I O N introduction run-length coding Huffman compression LZW compression OBET SEDGEWICK KEVIN WYNE http://algs4.cs.princeton.edu
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE https://algs4.cs.princeton.edu
More information5.5 Data Compression
5.5 Data Compression basics run-length encoding Huffman compression LZW compression Algorithms in Java, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2009 January 26, 2010 8:42:01 AM Data compression
More informationAlgorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE
Algorithms ROBERT SEDGEWICK KEVIN WAYNE 5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE http://algs4.cs.princeton.edu
More informationAlgorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression
5.5 DATA COMPRESSION Algorithms F O U R T H E D I T I O N basics run-length coding Huffman compression LZW compression R O B E R T S E D G E W I C K K E V I N W A Y N E Algorithms, 4 th Edition Robert
More information5.5 Data Compression
5.5 Data Compression basics run-length coding Huffman compression LZW compression Algorithms, 4 th Edition Robert Sedgewick and Kevin Wayne Copyright 2002 2010 February 8, 2011 2:50:01 PM Data compression
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory E.g.,
More informationCMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017
CMPSC112 Lecture 37: Data Compression Prof. John Wenskovitch 04/28/2017 What You Don t Get to Learn Self-balancing search trees: https://goo.gl/houquf https://goo.gl/r4osz2 Shell sort: https://goo.gl/awy3pk
More informationLossless compression II
Lossless II D 44 R 52 B 81 C 84 D 86 R 82 A 85 A 87 A 83 R 88 A 8A B 89 A 8B Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9)! 0.1 [0.9, 1.0)
More informationCS/COE 1501
CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Compression What is compression? Represent the same data using less storage space Can get more use out a disk of a given size Can get more use out of memory
More informationCompressing Data. Konstantin Tretyakov
Compressing Data Konstantin Tretyakov (kt@ut.ee) MTAT.03.238 Advanced April 26, 2012 Claude Elwood Shannon (1916-2001) C. E. Shannon. A mathematical theory of communication. 1948 C. E. Shannon. The mathematical
More informationMultimedia Systems. Part 20. Mahdi Vasighi
Multimedia Systems Part 2 Mahdi Vasighi www.iasbs.ac.ir/~vasighi Department of Computer Science and Information Technology, Institute for dvanced Studies in asic Sciences, Zanjan, Iran rithmetic Coding
More informationData compression.
Data compression anhtt-fit@mail.hut.edu.vn dungct@it-hut.edu.vn Data Compression Data in memory have used fixed length for representation For data transfer (in particular), this method is inefficient.
More informationCompression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:
CS231 Algorithms Handout # 31 Prof. Lyn Turbak November 20, 2001 Wellesley College Compression The Big Picture We want to be able to store and retrieve data, as well as communicate it with others. In general,
More informationIMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I
IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I 1 Need For Compression 2D data sets are much larger than 1D. TV and movie data sets are effectively 3D (2-space, 1-time). Need Compression for
More informationData compression with Huffman and LZW
Data compression with Huffman and LZW André R. Brodtkorb, Andre.Brodtkorb@sintef.no Outline Data storage and compression Huffman: how it works and where it's used LZW: how it works and where it's used
More informationASSIGNMENT 5 TIPS AND TRICKS
ASSIGNMENT 5 TIPS AND TRICKS linear-feedback shift registers Java implementation a simple encryption scheme http://princeton.edu/~cos26 Last updated on /26/7 : PM Goals OOP: implement a data type; write
More informationEntropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code
Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic
More informationData Compression. Guest lecture, SGDS Fall 2011
Data Compression Guest lecture, SGDS Fall 2011 1 Basics Lossy/lossless Alphabet compaction Compression is impossible Compression is possible RLE Variable-length codes Undecidable Pigeon-holes Patterns
More informationCIS 121 Data Structures and Algorithms with Java Spring 2018
CIS 121 Data Structures and Algorithms with Java Spring 2018 Homework 6 Compression Due: Monday, March 12, 11:59pm online 2 Required Problems (45 points), Qualitative Questions (10 points), and Style and
More informationFundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding
Fundamentals of Multimedia Lecture 5 Lossless Data Compression Variable Length Coding Mahmoud El-Gayyar elgayyar@ci.suez.edu.eg Mahmoud El-Gayyar / Fundamentals of Multimedia 1 Data Compression Compression
More informationSimple variant of coding with a variable number of symbols and fixlength codewords.
Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length
More informationLossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen
Lossless compression B U B U B 2 U B A ϵ CSCI 47: Web Science Keith Vertanen C R D! Lossless compression Mo7va7on Overview Rules and limits of the game Things to exploit Run- length encoding (RLE) Exploit
More informationSource coding and compression
Computer Mathematics Week 5 Source coding and compression College of Information Science and Engineering Ritsumeikan University last week binary representations of signed numbers sign-magnitude, biased
More informationSIGNAL COMPRESSION Lecture Lempel-Ziv Coding
SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel
More informationNumbers and Computers. Debdeep Mukhopadhyay Assistant Professor Dept of Computer Sc and Engg IIT Madras
Numbers and Computers Debdeep Mukhopadhyay Assistant Professor Dept of Computer Sc and Engg IIT Madras 1 Think of a number between 1 and 15 8 9 10 11 12 13 14 15 4 5 6 7 12 13 14 15 2 3 6 7 10 11 14 15
More informationCPSC 301: Computing in the Life Sciences Lecture Notes 16: Data Representation
CPSC 301: Computing in the Life Sciences Lecture Notes 16: Data Representation George Tsiknis University of British Columbia Department of Computer Science Winter Term 2, 2015-2016 Last updated: 04/04/2016
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More informationChapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations How do we represent data in a computer? At the lowest level, a computer is an electronic machine. works by controlling the flow of electrons Easy to recognize
More informationS 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources
Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014
More informationDistributed source coding
Distributed source coding Suppose that we want to encode two sources (X, Y ) with joint probability mass function p(x, y). If the encoder has access to both X and Y, it is sufficient to use a rate R >
More informationCOS 226 Algorithms and Data Structures Fall Final
COS 226 Algorithms and Data Structures Fall 2018 Final This exam has 16 questions (including question 0) worth a total of 100 points. You have 180 minutes. This exam is preprocessed by a computer when
More informationError Resilient LZ 77 Data Compression
Error Resilient LZ 77 Data Compression Stefano Lonardi Wojciech Szpankowski Mark Daniel Ward Presentation by Peter Macko Motivation Lempel-Ziv 77 lacks any form of error correction Introducing a single
More informationLempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression lgorithm Introduction to the LZW lgorithm Example 1: Encoding using LZW Example 2: Decoding using LZW LZW: Concluding Notes Introduction to LZW s mentioned earlier, static
More informationCS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77
CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the
More informationITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding
ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding Huffman codes require us to have a fairly reasonable idea of how source symbol probabilities are distributed. There are a number of applications
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Thinh Nguyen (Based on Prof. Ben Lee s Slides) Oregon State University School of Electrical Engineering and Computer Science Outline Why compression?
More informationSo, what is data compression, and why do we need it?
In the last decade we have been witnessing a revolution in the way we communicate 2 The major contributors in this revolution are: Internet; The explosive development of mobile communications; and The
More informationData Compression Techniques
Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual
More informationCh. 2: Compression Basics Multimedia Systems
Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information
More informationData Compression 신찬수
Data Compression 신찬수 Data compression Reducing the size of the representation without affecting the information itself. Lossless compression vs. lossy compression text file image file movie file compression
More informationData Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data
More informationEE-575 INFORMATION THEORY - SEM 092
EE-575 INFORMATION THEORY - SEM 092 Project Report on Lempel Ziv compression technique. Department of Electrical Engineering Prepared By: Mohammed Akber Ali Student ID # g200806120. ------------------------------------------------------------------------------------------------------------------------------------------
More informationData Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey
Data Compression Media Signal Processing, Presentation 2 Presented By: Jahanzeb Farooq Michael Osadebey What is Data Compression? Definition -Reducing the amount of data required to represent a source
More informationDavid Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.
David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression
More informationG64PMM - Lecture 3.2. Analogue vs Digital. Analogue Media. Graphics & Still Image Representation
G64PMM - Lecture 3.2 Graphics & Still Image Representation Analogue vs Digital Analogue information Continuously variable signal Physical phenomena Sound/light/temperature/position/pressure Waveform Electromagnetic
More informationCompression; Error detection & correction
Compression; Error detection & correction compression: squeeze out redundancy to use less memory or use less network bandwidth encode the same information in fewer bits some bits carry no information some
More information1.1. INTRODUCTION 1.2. NUMBER SYSTEMS
Chapter 1. 1.1. INTRODUCTION Digital computers have brought about the information age that we live in today. Computers are important tools because they can locate and process enormous amounts of information
More informationLZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014
LZW Compression Ramana Kumar Kundella Indiana State University rkundella@sycamores.indstate.edu December 13, 2014 Abstract LZW is one of the well-known lossless compression methods. Since it has several
More informationIMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1
IMAGE COMPRESSION- I Week VIII Feb 25 02/25/2003 Image Compression-I 1 Reading.. Chapter 8 Sections 8.1, 8.2 8.3 (selected topics) 8.4 (Huffman, run-length, loss-less predictive) 8.5 (lossy predictive,
More informationEngineering Mathematics II Lecture 16 Compression
010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1 Lossless Compression Outline Huffman &
More informationMultimedia Networking ECE 599
Multimedia Networking ECE 599 Prof. Thinh Nguyen School of Electrical Engineering and Computer Science Based on B. Lee s lecture notes. 1 Outline Compression basics Entropy and information theory basics
More informationLossless Compression Algorithms
Multimedia Data Compression Part I Chapter 7 Lossless Compression Algorithms 1 Chapter 7 Lossless Compression Algorithms 1. Introduction 2. Basics of Information Theory 3. Lossless Compression Algorithms
More informationChapter 7 Lossless Compression Algorithms
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5 Dictionary-based Coding 7.6 Arithmetic Coding 7.7
More informationData Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression
An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression
More informationLecture 6 Review of Lossless Coding (II)
Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 6 Review of Lossless Coding (II) May 28, 2009 Outline Review Manual exercises on arithmetic coding and LZW dictionary coding 1 Review Lossy coding
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 26 Source Coding (Part 1) Hello everyone, we will start a new module today
More informationChapter 2 Bits, Data Types, and Operations
Chapter Bits, Data Types, and Operations How do we represent data in a computer? At the lowest level, a computer is an electronic machine. works by controlling the flow of electrons Easy to recognize two
More informationFundamentals of Programming
Fundamentals of Programming Lecture 2 Number Systems & Arithmetic Lecturer : Ebrahim Jahandar Some Parts borrowed from slides by IETC1011-Yourk University Common Number Systems System Base Symbols Used
More informationIntroduction to Data Compression
Introduction to Data Compression Guillaume Tochon guillaume.tochon@lrde.epita.fr LRDE, EPITA Guillaume Tochon (LRDE) CODO - Introduction 1 / 9 Data compression: whatizit? Guillaume Tochon (LRDE) CODO -
More information7: Image Compression
7: Image Compression Mark Handley Image Compression GIF (Graphics Interchange Format) PNG (Portable Network Graphics) MNG (Multiple-image Network Graphics) JPEG (Join Picture Expert Group) 1 GIF (Graphics
More informationECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University
ECE 499/599 Data Compression & Information Theory Thinh Nguyen Oregon State University Adminstrivia Office Hours TTh: 2-3 PM Kelley Engineering Center 3115 Class homepage http://www.eecs.orst.edu/~thinhq/teaching/ece499/spring06/spring06.html
More informationOverview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes
Overview Last Lecture Data Transmission This Lecture Data Compression Source: Lecture notes Next Lecture Data Integrity 1 Source : Sections 10.1, 10.3 Lecture 4 Data Compression 1 Data Compression Decreases
More informationDEFLATE COMPRESSION ALGORITHM
DEFLATE COMPRESSION ALGORITHM Savan Oswal 1, Anjali Singh 2, Kirthi Kumari 3 B.E Student, Department of Information Technology, KJ'S Trinity College Of Engineering and Research, Pune, India 1,2.3 Abstract
More informationCS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding
CS6B Handout 34 Autumn 22 November 2 th, 22 Data Compression and Huffman Encoding Handout written by Julie Zelenski. In the early 98s, personal computers had hard disks that were no larger than MB; today,
More informationCategory: Informational May DEFLATE Compressed Data Format Specification version 1.3
Network Working Group P. Deutsch Request for Comments: 1951 Aladdin Enterprises Category: Informational May 1996 DEFLATE Compressed Data Format Specification version 1.3 Status of This Memo This memo provides
More informationVC 12/13 T16 Video Compression
VC 12/13 T16 Video Compression Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos Miguel Tavares Coimbra Outline The need for compression Types of redundancy
More informationAn Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression
CS106B Winter 2017 Handout #21 March 3, 2017 Huffman Encoding and Data Compression Handout by Julie Zelenski with minor edits by Keith Schwarz In the early 1980s, personal computers had hard disks that
More informationEE67I Multimedia Communication Systems Lecture 4
EE67I Multimedia Communication Systems Lecture 4 Lossless Compression Basics of Information Theory Compression is either lossless, in which no information is lost, or lossy in which information is lost.
More informationDictionary techniques
Dictionary techniques The final concept that we will mention in this chapter is about dictionary techniques. Many modern compression algorithms rely on the modified versions of various dictionary techniques.
More informationDigital Image Processing
Digital Image Processing Image Compression Caution: The PDF version of this presentation will appear to have errors due to heavy use of animations Material in this presentation is largely based on/derived
More informationChapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations Original slides from Gregory Byrd, North Carolina State University Modified slides by Chris Wilcox, Colorado State University How do we represent data in a computer?!
More informationWelcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson
Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn 2 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information
More informationFundamentals of Programming (C)
Borrowed from lecturer notes by Omid Jafarinezhad Fundamentals of Programming (C) Group 8 Lecturer: Vahid Khodabakhshi Lecture Number Systems Department of Computer Engineering Outline Numeral Systems
More informationNumber Systems for Computers. Outline of Introduction. Binary, Octal and Hexadecimal numbers. Issues for Binary Representation of Numbers
Outline of Introduction Administrivia What is computer architecture? What do computers do? Representing high level things in binary Data objects: integers, decimals, characters, etc. Memory locations (We
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits
More informationCS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON
CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON Prof. Gurindar Sohi TAs: Junaid Khalid and Pradip Vallathol Midterm Examination 1 In Class (50 minutes) Friday, September
More informationData Representation and Binary Arithmetic. Lecture 2
Data Representation and Binary Arithmetic Lecture 2 Computer Data Data is stored as binary; 0 s and 1 s Because two-state ( 0 & 1 ) logic elements can be manufactured easily Bit: binary digit (smallest
More informationStudy of LZ77 and LZ78 Data Compression Techniques
Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information
More informationData Representa5on. CSC 2400: Computer Systems. What kinds of data do we need to represent?
CSC 2400: Computer Systems Data Representa5on What kinds of data do we need to represent? - Numbers signed, unsigned, integers, floating point, complex, rational, irrational, - Text characters, strings,
More informationChapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations Original slides from Gregory Byrd, North Carolina State University Modified by Chris Wilcox, S. Rajopadhye Colorado State University How do we represent data
More information5/17/2009. Digitizing Discrete Information. Ordering Symbols. Analog vs. Digital
Chapter 8: Bits and the "Why" of Bytes: Representing Information Digitally Digitizing Discrete Information Fluency with Information Technology Third Edition by Lawrence Snyder Copyright 2008 Pearson Education,
More informationCSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)
CSE100 Advanced Data Structures Lecture 13 (Based on Paul Kube course materials) CSE 100 Priority Queues in Huffman s algorithm Heaps and Priority Queues Time and space costs of coding with Huffman codes
More informationMore Bits and Bytes Huffman Coding
More Bits and Bytes Huffman Coding Encoding Text: How is it done? ASCII, UTF, Huffman algorithm ASCII C A T Lawrence Snyder, CSE UTF-8: All the alphabets in the world Uniform Transformation Format: a variable-width
More informationDigital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay
Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 29 Source Coding (Part-4) We have already had 3 classes on source coding
More informationImage coding and compression
Image coding and compression Robin Strand Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Today Information and Data Redundancy Image Quality Compression Coding
More information15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION
15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:
More informationGreedy Algorithms CHAPTER 16
CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often
More informationData Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.
Data Storage Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J. Glenn Brookshear Copyright 2012 Pearson Education, Inc. Data Storage Bits
More informationFundamentals of Video Compression. Video Compression
Fundamentals of Video Compression Introduction to Digital Video Basic Compression Techniques Still Image Compression Techniques - JPEG Video Compression Introduction to Digital Video Video is a stream
More informationData Representation From 0s and 1s to images CPSC 101
Data Representation From 0s and 1s to images CPSC 101 Learning Goals After the Data Representation: Images unit, you will be able to: Recognize and translate between binary and decimal numbers Define bit,
More informationCS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON
CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON Prof. Gurindar Sohi TAs: Pradip Vallathol and Junaid Khalid Midterm Examination 1 In Class (50 minutes) Friday, September
More informationUnicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC
Standard Alphanumeric Formats Unicode BCD ASCII EBCDIC Unicode Next slides 16-bit standard Developed by a consortia Intended to supercede older 7- and 8-bit codes Unicode Version 2.1 1998 Improves on version
More informationENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2
ENSC 424 - Multimedia Communications Engineering Topic 4: Huffman Coding 2 Jie Liang Engineering Science Simon Fraser University JieL@sfu.ca J. Liang: SFU ENSC 424 1 Outline Canonical Huffman code Huffman
More informationData Representa5on. CSC 2400: Computer Systems. What kinds of data do we need to represent?
CSC 2400: Computer Systems Data Representa5on What kinds of data do we need to represent? - Numbers signed, unsigned, integers, floating point, complex, rational, irrational, - Text characters, strings,
More informationCS341 *** TURN OFF ALL CELLPHONES *** Practice NAME
CS341 *** TURN OFF ALL CELLPHONES *** Practice Final Exam B. Wilson NAME OPEN BOOK / OPEN NOTES: I GIVE PARTIAL CREDIT! SHOW ALL WORK! 1. Processor Architecture (20 points) a. In a Harvard architecture
More informationNumber Representations
Simple Arithmetic [Arithm Notes] Number representations Signed numbers Sign-magnitude, ones and twos complement Arithmetic Addition, subtraction, negation, overflow MIPS instructions Logic operations MIPS
More information