MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Similar documents
Greedy Algorithms CHAPTER 16

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

Text Compression through Huffman Coding. Terminology

15 July, Huffman Trees. Heaps

Algorithms Dr. Haim Levkowitz

CSE 421 Greedy: Huffman Codes

16 Greedy Algorithms

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Greedy algorithms part 2, and Huffman code

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Analysis of Algorithms - Greedy algorithms -

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

16.Greedy algorithms

6. Finding Efficient Compressions; Huffman and Hu-Tucker

CS 206 Introduction to Computer Science II

Greedy Algorithms CSE 780

EE 368. Weeks 5 (Notes)

Lossless Compression Algorithms

CSC 373 Lecture # 3 Instructor: Milad Eftekhar

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

CMPSCI 240 Reasoning Under Uncertainty Homework 4

EE67I Multimedia Communication Systems Lecture 4

Second Semester - Question Bank Department of Computer Science Advanced Data Structures and Algorithms...

Lecture: Analysis of Algorithms (CS )

Greedy Algorithms. Alexandra Stefan

Compressing Data. Konstantin Tretyakov

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Greedy Algorithms CSE 6331

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Greedy Algorithms. Textbook reading. Chapter 4 Chapter 5. CSci 3110 Greedy Algorithms 1/63

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

CS 6783 (Applied Algorithms) Lecture 5

Text Compression. Jayadev Misra The University of Texas at Austin July 1, A Very Incomplete Introduction To Information Theory 2

Greedy Algorithms and Huffman Coding

Chapter 16: Greedy Algorithm

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

Heaps. 2/13/2006 Heaps 1

Data compression.

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Priority Queues and Binary Heaps

Binary Trees and Huffman Encoding Binary Search Trees

Horn Formulae. CS124 Course Notes 8 Spring 2018

Analysis of Algorithms

More Bits and Bytes Huffman Coding

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

Heaps and Priority Queues

looking ahead to see the optimum

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

CSE331 Introduction to Algorithms Lecture 15 Minimum Spanning Trees

G205 Fundamentals of Computer Engineering. CLASS 21, Mon. Nov Stefano Basagni Fall 2004 M-W, 1:30pm-3:10pm

Priority Queues. Chapter 9

Design and Analysis of Algorithms

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Chapter 16. Greedy Algorithms

CS 161: Design and Analysis of Algorithms

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

ECE608 - Chapter 16 answers

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15

22 Elementary Graph Algorithms. There are two standard ways to represent a

CSC 310, Fall 2011 Solutions to Theory Assignment #1

Greedy algorithms 2 4/5/12. Knapsack problems: Greedy or not? Compression algorithms. Data compression. David Kauchak cs302 Spring 2012

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +...

Chapter 7 Lossless Compression Algorithms

Lecture 3, Review of Algorithms. What is Algorithm?

Design and Analysis of Algorithms

8.1. Optimal Binary Search Trees:

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Intro. To Multimedia Engineering Lossless Compression

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

CS/COE 1501

Huffman Codes (data compression)

CS521 \ Notes for the Final Exam

13.4 Deletion in red-black trees

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Algorithms (VI) Greedy Algorithms. Guoqiang Li. School of Software, Shanghai Jiao Tong University

COMP251: Greedy algorithms

Data Compression Techniques

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

Algorithms Interview Questions

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I

CH 8. HEAPS AND PRIORITY QUEUES

Algorithms and Data Structures CS-CO-412

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Sorting and Searching

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

CH. 8 PRIORITY QUEUES AND HEAPS

1 Non greedy algorithms (which we should have covered

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of MCA

1 Dijkstra s Algorithm

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Transcription:

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, 2016 Huffman Codes CLRS: Ch 16.3 Ziv-Lempel is the most popular compression algorithm today. It is an adaptive algorithm, i.e., the code changes throughout the message. There are several variants of Ziv-Lempel, many of which are proprietary. Example apps using this compression method are Unix compress, gzip, and Windows winzip. The encoded string has length proportional to the entropy, i.e., the information content of the source string. Huffman s was an early compression algorithm. It is a non-adaptive code. The algorithm is implemented as Unix System V s pack, and compact (adaptive Huffman). It is also used within bzip, JPEG, and MPEG. Prefix-free codes An alphabet of n characters can be encoded in strings that are log n bits long, e.g., for alphabet { a, b, c, d } we can use strings 00, 01, 10, 11 as code words for a, b, c, and d, respectively. Examples of such fixed-length codes in general use is ASCII, ISO 8859-1 (Latin-1), and Unicode. To encode a given text, we can usually achieve shorter total length by giving shorter code words to frequently occurring characters. For example, if a occurs 51 times, and b, c and d each occurs 1 time, we can use code words 0 for a, 10 for b, 110 for c, and 111 for d. The code should be prefix-free, i.e., no code word is a prefix of another. This condition is necessary and sufficient for an instantaneous code, i.e., one with the property that the encoded message may be decoded on the fly while scanning through it. A prefix-free code corresponds to a binary tree where each leaf represents a character of the alphabet. The code word for each character is determined by the character s access path, with left child corresponding to 0, and right child corresponding to 1.

2 MCS-375: Handout #G2 * * / \ 0/ \ 1 0/ \1 / \ / \ [] * * * a 0/ \1 0 / \1 0/ \1 / \ [] * [] [] [] [] b 0/ \1 a b c d / \ [] [] c d T_1 T_2 For a prefix-free code, any message has 1 decoding. Problem 1. Given a text, find the prefix-free code giving the shortest encoding. Example. In abacad, a occurs 3 times, each of b, c, d occurs once. T 1 gives encoding length (3 2) + 2 + 2 + 2 = 12, T 2 gives encoding length (3 1) + 2 + 3 + 3 = 11. Path length in trees. Consider a tree T with leaves L. Let d(x) denote the depth of node x. Define the external path length of T to be the sum of the depths of all leaves { d(l) : l L }. Let each leaf l have weight w(l). Define the weighted external path length (w.e.p.l) P (T ) of T to be the sum of the weighted depths of all leaves P (T ) = { w(l)d(l) : l L }. Example: Consider trees T 1 and T 2 above. Setting w(a) = 3 and w(b) = w(c) = w(d) = 1, we find P (T 1 ) = 12 and P (T 2 ) = 11. Note that the w.e.p.l then equals encoding length.

MCS-375: Handout #G2 3 Problem 2. Given a set of items I and nonnegative weight function w on the items, find a binary tree T with leaves I and having minimum w.e.p.l. Problem 1 reduces to Problem 2. To solve Problem 1, we scan the text, counting up each character s frequency. We then solve Problem 2 by defining the weight of a character to be its frequency. As illustrated above, w.e.p.l. equals encoding length. Optimum Merge Trees We wish to merge 2 sorted files into 1 sorted file. Example 1: Merging 2, 4, 8, 16 and 1, 3, 9, 10, 11 gives 1, 2, 3, 4, 8, 9, 10, 11, 16. A merging algorithm works by comparing the first numbers of both files, outputting the smaller number, and recursively merging the remaining files. The time to merge a file of size m with a file of size n is O(m + n). The General Merging Problem We are given n sorted files of varying lengths. We wish to merge them into 1 sorted file by repeatedly merging 2 files until 1 file remains. A sequence of merges corresponds to a binary tree. Example 2: Here are 2 ways to merge 3 files of length 10, 20, 30: * * * 30 10 * 10 20 20 30

4 MCS-375: Handout #G2 1st merge takes time 30 1st merge takes time 50 2nd merge takes time 60 2nd merge takes time 60 total merge time 90 total merge time 110 Definition. An optimum merge pattern (optimum merge tree) minimizes the total merge time, i.e., it minimizes the length of all files output by a merge. Problem 3. Given the lengths of n sorted files, find an optimum merge pattern. Lemma. Optimum merge trees minimize w.e.p.l. Proof Sketch. Consider a binary tree T with leaves L and weight w(l), where l L. Interpreting T as a merge tree, define the weight of an interior node to be the sum of the weights of its children. (The weight of an interior node is the time for the merge at that node.) claim: w.e.p.l. equals the total interior node weight (i.e., total merge time) since a leaf l contributes d(l)w(l) to both quantities. The lemma follows from the claim. Problem 4. Given a set of items I and nonnegative weight function w on the items, find a binary tree T with leaves I and having minimum total interior node weight. By the definition of interior node weight, we see that Problem 3 reduces to Problem 4. We previously showed Problem 1 reduces to Problem 2. The above lemma shows Problem 2 is equivalent to Problem 4. So when designing or proving the correctness of a solution, we may assume the solution is to Problem 2 or Problem 4. Greedy algorithm for merge patterns Problem 3 has an obvious greedy algorithm: repeatedly merge the shortest files. We will show this greedy algorithm is correct. Correctness

MCS-375: Handout #G2 5 Theorem. The optimum merge pattern algorithm (i.e., Huffman s algorithm) is correct. Lemma 1. Some optimal tree has the 2 smallest items in the 2 sibling leaves of maximum depth. Proof Sketch. Consider any binary tree whose leaves are the items. Suppose the 2 smallest items aren t siblings of maximum depth. Swap them into that position. The swap doesn t increase the w.e.p.l. So if we start with an optimum tree, we ll get another optimum tree that satisfies the Lemma. To complete the proof of correctness, we must show that after Huffman s algorithm combines the 2 smallest items, it can proceed recursively. Proof. 1. The first merge of it is valid, by Lemma 1. 2. After the first merge, the problem is to find another optimum merge pattern so the same strategy works! So Huffman s algorithm is correct. Huffman s Algorithm let a and b be 2 items of smallest weight replace them by an item p of weight w(a) + w(b) recursively find the minimum tree R for the new items return T the tree R with a and b as children of p Example Let s be given six items with weights 2, 3, 3, 4, 6, 12. [MISSING FIGURE HERE] Implementation and Timing 1. For efficiency we implement Huffman s algorithm iteratively rather than recursively. 2. Instead of maintaining the items and the combined items as a sorted list as in the figure, we maintain them in a priority queue.

6 MCS-375: Handout #G2 Pseudocode for Huffman s Algorithm Q empty priority queue for i 1 to n do INSERT item i in Q for i 1 to n 1 do { a EXTRACT MIN(Q) b EXTRACT MIN(Q) combine a and b into a new item, and INSERT it in Q } /* now Q contains 1 item, representing the Huffman tree */ Total time = O(n) O(log n) = O(n log n) since each iteration does O(1) priority queue operations and each priority queue operation takes O(log n) time.