Greedy Algorithms and Huffman Coding

Similar documents
Analysis of Algorithms - Greedy algorithms -

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Text Compression through Huffman Coding. Terminology

Lecture: Analysis of Algorithms (CS )

Algorithms Dr. Haim Levkowitz

Greedy algorithms part 2, and Huffman code

Greedy Algorithms. Alexandra Stefan

We will go over the basic scenarios, in which it is appropriate to apply this technique, and a few concrete applications.

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

CSE 421 Greedy: Huffman Codes

Greedy Algorithms CHAPTER 16

Design and Analysis of Algorithms

Design and Analysis of Algorithms

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

Chapter 16: Greedy Algorithm

ECE608 - Chapter 16 answers

CSE 143 Lecture 22. Huffman Tree

6. Finding Efficient Compressions; Huffman and Hu-Tucker

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

15 July, Huffman Trees. Heaps

14.4 Description of Huffman Coding

We ve done. Now. Next

EE 368. Weeks 5 (Notes)

16 Greedy Algorithms

CSE 2320 Notes 6: Greedy Algorithms

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Huffman Codes (data compression)

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Spanning Trees. Lecture 20 CS2110 Spring 2015

Greedy Algorithms. Previous Examples: Huffman coding, Minimum Spanning Tree Algorithms

Analysis of Algorithms

COMP251: Greedy algorithms

Horn Formulae. CS124 Course Notes 8 Spring 2018

CMPSCI 240 Reasoning Under Uncertainty Homework 4

Department of Computer Science and Engineering Analysis and Design of Algorithm (CS-4004) Subject Notes

16.Greedy algorithms

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Lecture 14: General Problem-Solving Methods

More Bits and Bytes Huffman Coding

CSC 421: Algorithm Design & Analysis. Spring 2015

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

Garbage Collection: recycling unused memory

Lossless Compression Algorithms

Problem Strategies. 320 Greedy Strategies 6

CS159. Nathan Sprague. November 9, 2015

CS/COE 1501

CS201 Discussion 12 HUFFMAN + ERDOSNUMBERS

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Department of Computer Applications. MCA 312: Design and Analysis of Algorithms. [Part I : Medium Answer Type Questions] UNIT I

Greedy Approach: Intro

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Algorithm classification

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

Priority Queues. Priority Queue Specification. Priority Queue ADT. Implementation #1

CS-6402 DESIGN AND ANALYSIS OF ALGORITHMS

Giri Narasimhan & Kip Irvine

Priority Queues. Chapter 9

Algorithms and Data Structures CS-CO-412

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Union Find and Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 8

G205 Fundamentals of Computer Engineering. CLASS 21, Mon. Nov Stefano Basagni Fall 2004 M-W, 1:30pm-3:10pm

looking ahead to see the optimum

Greedy algorithms 2 4/5/12. Knapsack problems: Greedy or not? Compression algorithms. Data compression. David Kauchak cs302 Spring 2012

Intro. To Multimedia Engineering Lossless Compression

Lecture 9: Balanced Binary Search Trees, Priority Queues, Heaps, Binary Trees for Compression, General Trees

Data Compression Techniques

14 Data Compression by Huffman Encoding

Lecture 3, Review of Algorithms. What is Algorithm?

CSE541 Class 2. Jeremy Buhler. September 1, 2016

CS301 - Data Structures Glossary By

10/31/18. About A6, Prelim 2. Spanning Trees, greedy algorithms. Facts about trees. Undirected trees

Recursive Data Structures and Grammars

Spanning Trees, greedy algorithms. Lecture 20 CS2110 Fall 2018

Topics. Trees Vojislav Kecman. Which graphs are trees? Terminology. Terminology Trees as Models Some Tree Theorems Applications of Trees CMSC 302

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

Announcements. Programming assignment 1 posted - need to submit a.sh file

Data compression.

CS141: Intermediate Data Structures and Algorithms Greedy Algorithms

Design and Analysis of Algorithms 演算法設計與分析. Lecture 7 April 6, 2016 洪國寶

CS/COE 1501

Practical Session 10 - Huffman code, Sort properties, QuickSort algorithm, Selection

403: Algorithms and Data Structures. Heaps. Fall 2016 UAlbany Computer Science. Some slides borrowed by David Luebke

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Greedy Algorithms. Subhash Suri. October 9, 2017

Dynamic Programming Assembly-Line Scheduling. Greedy Algorithms

CS F-10 Greedy Algorithms 1

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

CPE702 Algorithm Analysis and Design Week 7 Algorithm Design Patterns

Binary Trees and Huffman Encoding Binary Search Trees

CS 206 Introduction to Computer Science II

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of MCA

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

RB-Tree Augmentation. OS-Rank. OS-Select. Augment x with Size(x), where. Size(x) = size of subtree rooted at x Size(NIL) = 0

CSE 373 MAY 24 TH ANALYSIS AND NON- COMPARISON SORTING

S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 165

Transcription:

Greedy Algorithms and Huffman Coding Henry Z. Lo June 10, 2014 1 Greedy Algorithms 1.1 Change making problem Problem 1. You have quarters, dimes, nickels, and pennies. amount, n, provide the least number of coins which sum up to n. Given some For example, let n = 15. There are four different coin combinations to get 15 (see Figure 1). The optimal combination is one dime and one nickel. It turns out for this problem, we can just pick the largest coin value possible, and repeat until done. Because this algorithm takes the largest possible step to its goal every time (e.g. from 0 to 15), it is called greedy. This produces the optimal solution for this particular set of coins, but does not work all the time. Problem 2. You have 1, 7, and 9 coins. Given some amount, n, provide the least number of coins which sum up to n. If we let n = 15 again, then the first greedy choice takes the 9 coin. There are 6 remaining, which must be made up for with 1 coins. This yields a 7 coin solution. The optimal configuration is 7, 7, 1. Figure 1: The four ways of getting 15 cents. The dime and nickel combination is optimal. 1

1.2 Greedy methods Greedy methods apply to combinatorial optimization problems. In these problems, we try to find some optimal set or combination of elements to minimize or maximize something given constraints. In the previous case, the elements are coins, the constraints are that the coins sum to n, and we want to minimize the number of coins. These problems pop up all over computer science, so we will spend some time studying them. 1.2.1 Problems solved by greedy algorithms To determine if a greedy algorithm works for a problem, we need to show that the problem has the following properties: Greedy choice property: an optimal solution contains the greedy choice. Optimal substructure: optimal solutions consists of an optimal subsolution, plus an optimal choice. Change making contains optimal substructure. Consider an optimal solution for n. Then the coins making up n c, where c is an optimal choice, must have been optimal. If it wasn t, then you can find some set to get less coins to make up n c, and the original optimal solution was not optimal. Change making does not contain the greedy choice property in general - as we ve seen, sometimes the optimal solution does not contain the greedy choice. However, in the US currency, it does. If n 25, the optimal solution contains a quarter; if n 10, it contains a dime, etc. We will not emphasize mathematical proofs in this course, but know that there is a method for determining when greedy algorithms can be applied. 1.2.2 Properties of greedy algorithms Despite their limitations, it s worth noting greedy algorithms because when they do solve a problem, they are extremely efficient. There is no backtracking, and they are easy to program. Consider the following programs for the changemaking problem. Greedy solution: sum, numcoins = 0 coins = [25,10,5,1] while (sum < n) numcoins++ for c in coins: if (n-sum >= c) sum = sum+c There is no algorithm more efficient. You would have to build a tree-like structure for exhaustively searching the problem space. 2

Figure 2: A set of activities with finish and end times. If we want to perform the most activities, A1, A4, A8, and A11 is optimal. 1.2.3 Drawbacks Greedy methods always take the largest step towards its goals, but in doing so may miss some better combinations, as seen in the second problem. They do not take into account the future (if I take 7 instead of 9 now, then I can take another big coin in my next step). However, even when greedy algorithms don t give the best result, sometimes the greedy choice is a good heuristic for getting reasonable solutions (it can also be a very bad heuristic). Some problems are intractable, and greedy algorithms can give decent answers in a reasonable time. 1.3 Activity selection Problem 3. There are n activities, each with start times s i and finish times f i. How do you pick a set of non-overlapping activities so that you perform the most possible activities? The solution, surprisingly, consists of picking the earliest ending activities, removing the overlapping activities, and repeat (see Figure 2). 3

Figure 3: A tree representing prefix codes. We can prove that this greedy solution works. Order the talks a i by finish time, so that a 1 is the first activity to finish. a 1 must be in some optimal solution, because if a solution is optimal, then it must contain an event overlapping with a 1. Since a 1 is the first activity, the overlapping event must start after a 1. So we can replace the activity with a 1 and still have an optimal solution. This proves the greedy choice property. After removing the optimal choice, we are left with a set of maximal events. We have to select the optimal set of events among this now reduced set. Thus, the problem contains the optimal substructure property. 2 Huffman Coding 2.1 Idea In ASCII encoding, each character is represented by 8 bits. However, some letters are extremely rare, and some very common. We can define an alternative mapping, where less common characters like z require more than 8 bits, and more common characters like s take up less than 7 bits. Using this encoding, we can save space. However, with variable-length encoding, we have to make sure no codes are prefices of one another. For example, we cannot have 0 code to a, and 01 code to b, because we won t know whether to decode the 0 right away, or wait for more input. There must be no ambiguity in parsing. These types of variable-length codes are called prefix codes. Because these codes consist of bits, we can represent them using a binary tree, as in Figure 3. Each path in the tree represents a bit string. Note that characters are leaves; if there was a character which descended from another character, then one code was a prefix of the other. 4

Figure 4: Huffman algorithm for combining characters a1, a2, a3, and a4. Red numbers represent frequency, and yellow numbers represent code. 2.2 Huffman codes Huffman codes solve the problem of finding the optimal encoding. Note that the code depends on the text being compressed. We want to have shorter codes / shorter paths / less parents for characters with very high counts, but we are alright with having many parents for characters with less counts. 1. Create a leaf node for each character, put them in priority queue in order of counts. 2. While the queue has more than one character: Remove two nodes with lowest counts. Merge them to have one parent, with a cumulative sum. Add this node to the queue. 3. Last node will be the root node. See Figure 4 for a visual. Since efficient priority queue data structures require O(log n) time per insertion, and a tree with n leaves has 2n 1 nodes, this algorithm operates in O(n log n) time, where n is the number of characters. 2.3 Greediness of Huffman codes Huffman codes are greedy in the sense that the least frequent codes are given the longest paths in the tree. Greedy choice property: if x and y have the two lowest counts, then the optimal algorithm will have them at the same path length, i.e. their codes are two of the longest. Optimal substructure: every subtree of a Huffman code defines an optimal code for its leaf characters. 5