CMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017

Similar documents
5.5 Data Compression

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms 5.5 DATA COMPRESSION. basics run-length coding Huffman compression LZW compression

5.5 Data Compression

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

5.5 Data Compression. basics run-length coding Huffman compression LZW compression. Data compression

Data compression.

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Data Compression. Guest lecture, SGDS Fall 2011

CS/COE 1501 cs.pitt.edu/~bill/1501/ Searching

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Lossless compression B 1 U 1 B 2 C R D! CSCI 470: Web Science Keith Vertanen

Greedy algorithms part 2, and Huffman code

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

CSE 421 Greedy: Huffman Codes

CS 206 Introduction to Computer Science II

Algorithms Dr. Haim Levkowitz

Data Structures and Algorithms

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

CS/COE 1501

Analysis of Algorithms - Greedy algorithms -

CS/COE 1501

Multimedia Networking ECE 599

Lossless Compression Algorithms

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Huffman Codes (data compression)

Greedy Algorithms and Huffman Coding

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey.

Text Compression through Huffman Coding. Terminology

Data Compression Techniques

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

CMPSCI 240 Reasoning Under Uncertainty Homework 4

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

EE 368. Weeks 5 (Notes)

Greedy Algorithms. Alexandra Stefan

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

COS 226 Algorithms and Data Structures Fall Final Solutions. 10. Remark: this is essentially the same question from the midterm.

Heaps and Priority Queues

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

CIS 121 Data Structures and Algorithms with Java Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018

COMP251: Greedy algorithms

cs2010: algorithms and data structures

introduction run-length coding Huffman compression Applications

Binary Trees and Huffman Encoding Binary Search Trees

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

EE67I Multimedia Communication Systems Lecture 4

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Digital Image Processing

6. Finding Efficient Compressions; Huffman and Hu-Tucker

A New Compression Method Strictly for English Textual Data

More Bits and Bytes Huffman Coding

Programming Abstractions

TREES Lecture 10 CS2110 Spring2014

Data Compression Scheme of Dynamic Huffman Code for Different Languages

CSE 143 Lecture 22. Huffman Tree

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Algorithms and Data Structures CS-CO-412

Design and Analysis of Algorithms

CS-301 Data Structure. Tariq Hanif

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

Greedy Approach: Intro

CS201 Discussion 12 HUFFMAN + ERDOSNUMBERS

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

CPE702 Algorithm Analysis and Design Week 7 Algorithm Design Patterns

R16 SET - 1 '' ''' '' ''' Code No: R

15 July, Huffman Trees. Heaps

Red-Black, Splay and Huffman Trees

Chapter 16: Greedy Algorithm

Ch. 2: Compression Basics Multimedia Systems

An Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression

CSE143X: Computer Programming I & II Programming Assignment #10 due: Friday, 12/8/17, 11:00 pm

Giri Narasimhan & Kip Irvine

Out: April 19, 2017 Due: April 26, 2017 (Wednesday, Reading/Study Day, no late work accepted after Friday)

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

EXAMINATIONS 2005 END-YEAR. COMP 103 Introduction to Data Structures and Algorithms

Priority Queues. Chapter 9

Information Retrieval

Intro. To Multimedia Engineering Lossless Compression

COS 226 Algorithms and Data Structures Fall Final Exam

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm

Data Compression 신찬수

DUKE UNIVERSITY Department of Computer Science. Test 2: CompSci 100

Building Java Programs. Priority Queues, Huffman Encoding

16 Greedy Algorithms

Priority Queues and Huffman Encoding

Priority Queues and Huffman Encoding

Course schedule. COMP9319 Web Data Compression and Search. Huffman coding. Original readings

COMP 103 Introduction to Data Structures and Algorithms

Transcription:

CMPSC112 Lecture 37: Data Compression Prof. John Wenskovitch 04/28/2017

What You Don t Get to Learn Self-balancing search trees: https://goo.gl/houquf https://goo.gl/r4osz2 Shell sort: https://goo.gl/awy3pk Really ridiculous sorting algorithms: Bogosort (a.k.a. Stupid Sort): https://en.wikipedia.org/wiki/bogosort Sleep Sort: https://rosettacode.org/wiki/sorting_algorithms/sleep_sort Cycle Sort: https://en.wikipedia.org/wiki/cycle_sort Slow Sort: http://wiki.c2.com/?slowsort Intelligent Design Sort: http://www.dangermouse.net/esoteric/intelligentdesignsort.html 04/28/2017 Data Compression 2

Last Time Tries Efficient manner of storing strings in a tree Each level of the trie represents another letter in the strings The value of a (key,value) pair marks the end of a word de la Briandais tries solve the issue with wasted space near the leaves of the trie 04/28/2017 Data Compression 3

Data Compression Problem Overview Compression: Input 1: A sequence of bits Output 1: A smaller sequence of bits Transforms a bitstream B into a compressed version C(B) Expansion Input 2: A sequence of bits Output 2: An expanded sequence of bits Transforms the compressed C(B) back into the original B 04/28/2017 Data Compression 4

Data Compression Challenges Compression must be reversible (lossless compression), or at least a very close approximation (lossy compression) Compression must be computationally efficient (time is obvious, what about space?) Compression must work for all inputs, not just a subset (working at the bit level) 04/28/2017 Data Compression 5

Data Compression Challenges Compression must work for all inputs, not just a subset. (really?) Theorem: No algorithm can compress every bitstream. Proof: Suppose that you have an algorithm that can compress every bitstream. Then, you could use the algorithm to compress its output to get a still shorter bitstream, and continue until you have a bitstream of length 0. This conclusion is absurd, therefore no algorithm can compress every bitstream. Compression [algorithms] must [run on] all inputs, not just a subset. 04/28/2017 Data Compression 6

Huffman Compression Idea: Use a small number of bits for frequent characters, and a large number of bits for rare characters. ABRACADABRA! In ASCII (7 bits per char): 1000001100001010100101000001100001110000 0110001001000001100001010100101000001010 0001 (84 bits) 04/28/2017 Data Compression 7

Huffman Compression ABRACADABRA! Variable-Length (A=0, B=1, R=00, C=01, D=10,!=11): 01000010100100011 (17 bits) Problem Not reversible, could also be CRRDDCRCB. Prefix-Free Variable-Length No character code is the prefix of another. (A=0, B=1111, C=110, D=100, R=1110,!=101): 011111110011001000111111100101 (30 bits) Now we just need to figure out how to generate prefix-free variable-length character codes 04/28/2017 Data Compression 8

Huffman Compression Tries 04/28/2017 Data Compression 9

Huffman Compression Construct Trie class Node implements Comparable<Node> { char ch; int freq; final Node left, right; Node(char c, int f, Node l, Node r) { ch = c; freq = f; left = l; right = r; } //Node (constructor) boolean isleaf() { return (left == null) && (right == null); } //isleft int compareto(node that) { return this.freq that.freq; } //compareto } //Node (class) 04/28/2017 Data Compression 10

Huffman Compression Construct Trie Node buildtrie(int[] freq) { MinPQ<Node> pq = min MinPQ<Node>(); for (char c = 0; c < R; c++) { if (freq[c] > 0) { pq.insert(new Node(c, freq[c], null, null)); } //if } //for while (pq.size() > 1) { Node x = pq.delmin(); Node y = pq.delmin(); Node parent = new Node( \0, x.freq + y.freq, x, y); pq.insert(parent); } //while return pq.delmin(); } //buildtrie 04/28/2017 Data Compression 11

Huffman Compression Construct Trie 04/28/2017 Data Compression 12

Huffman Compression Construct Trie 04/28/2017 Data Compression 13

Huffman Compression Analysis Theorem: For any prefix-free code, the length of the encoded bitstring is equal to the weighted external path length of the corresponding trie. Proof: From the construction of the trie: The depth of each leaf is the number of bits used to encode the character in the leaf. Thus, the weighted external path length is the length of the encoded bitstring the sum over all letters of the number of occurrences times the number of bits per occurrence. 04/28/2017 Data Compression 14

Any Questions? 04/28/2017 Data Compression 15