CS 161: Design and Analysis of Algorithms

Similar documents
16.3 The Huffman code problem

16.3 The Huffman code problem

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

16 Greedy Algorithms

CSE 421 Greedy: Huffman Codes

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x

Text Compression through Huffman Coding. Terminology

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Greedy Algorithms CHAPTER 16

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

EE 368. Weeks 5 (Notes)

Design and Analysis of Algorithms

Algorithms Dr. Haim Levkowitz

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

COMP 250 Fall graph traversal Nov. 15/16, 2017

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the

Horn Formulae. CS124 Course Notes 8 Spring 2018

Global Constraints. Combinatorial Problem Solving (CPS) Enric Rodríguez-Carbonell (based on materials by Javier Larrosa) February 22, 2019

16.Greedy algorithms

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

Data Structures and Algorithms

Chapter 16: Greedy Algorithm

6. Finding Efficient Compressions; Huffman and Hu-Tucker

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

Data Compression Techniques

CSC 373 Lecture # 3 Instructor: Milad Eftekhar

Design and Analysis of Algorithms

CS521 \ Notes for the Final Exam

Greedy Algorithms. Textbook reading. Chapter 4 Chapter 5. CSci 3110 Greedy Algorithms 1/63

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

(2,4) Trees. 2/22/2006 (2,4) Trees 1

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions.

Greedy algorithms 2 4/5/12. Knapsack problems: Greedy or not? Compression algorithms. Data compression. David Kauchak cs302 Spring 2012

looking ahead to see the optimum

Sorting and Selection

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

CMSC 341 Leftist Heaps

Building a network. Properties of the optimal solutions. Trees. A greedy approach. Lemma (1) Lemma (2) Lemma (3) Lemma (4)

Chapter 5. Greedy algorithms

Algorithms (VI) Greedy Algorithms. Guoqiang Li. School of Software, Shanghai Jiao Tong University

CS F-11 B-Trees 1

CSE 5311 Notes 4a: Priority Queues

Analysis of Algorithms - Greedy algorithms -

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

7. Sorting I. 7.1 Simple Sorting. Problem. Algorithm: IsSorted(A) 1 i j n. Simple Sorting

Algorithms (IX) Yijia Chen Shanghai Jiaotong University

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

Design and Analysis of Algorithms 演算法設計與分析. Lecture 7 April 6, 2016 洪國寶

Multi-Way Search Trees

Priority Queues. 1 Introduction. 2 Naïve Implementations. CSci 335 Software Design and Analysis III Chapter 6 Priority Queues. Prof.

CMPSCI 240 Reasoning Under Uncertainty Homework 4

In this lecture, we ll look at applications of duality to three problems:

8. Sorting II. 8.1 Heapsort. Heapsort. [Max-]Heap 6. Heapsort, Quicksort, Mergesort. Binary tree with the following properties Wurzel

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Advanced algorithms. topological ordering, minimum spanning tree, Union-Find problem. Jiří Vyskočil, Radek Mařík 2012

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Algorithm Theory. 8 Treaps. Christian Schindelhauer

Trees. A tree is a directed graph with the property

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

Problem. Input: An array A = (A[1],..., A[n]) with length n. Output: a permutation A of A, that is sorted: A [i] A [j] for all. 1 i j n.

Greedy Algorithms. Alexandra Stefan

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

FINAL EXAM SOLUTIONS

Lecture 15. Error-free variable length schemes: Shannon-Fano code

Binary heaps (chapters ) Leftist heaps

implementing the breadth-first search algorithm implementing the depth-first search algorithm

Discrete mathematics

Multiway Search Trees. Multiway-Search Trees (cont d)

CSE332: Data Abstractions Lecture 21: Parallel Prefix and Parallel Sorting. Tyler Robison Summer 2010

Huffman Codes (data compression)

Priority Queues and Huffman Trees

Minimum Spanning Trees

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Greedy algorithms part 2, and Huffman code

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

15.4 Longest common subsequence

Department of Computer Science and Engineering Analysis and Design of Algorithm (CS-4004) Subject Notes

COMP : Trees. COMP20012 Trees 219

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Multi-Way Search Trees

1 Minimum Cut Problem

Assignment 5: Solutions

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

CS 337 Project 1: Minimum-Weight Binary Search Trees

Thus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1.

CS 161: Design and Analysis of Algorithms

Lecture 4: September 11, 2003

Acyclic orientations do not lead to optimal deadlock-free packet routing algorithms

CS 6783 (Applied Algorithms) Lecture 5

Greedy Algorithms. Subhash Suri. October 9, 2017

Heapsort. Why study Heapsort?

Transcription:

CS 161: Design and Analysis o Algorithms

Announcements Homework 3, problem 3 removed

Greedy Algorithms 4: Human Encoding/Set Cover Human Encoding Set Cover

Alphabets and Strings Alphabet = inite set o symbols English alphabet = {a,b,c,,z} Hex values = {0,1,,9,A,B,C,D,E,F} String = sequence o symbols rom some alphabet This is a string

How to Encode Computers store things as 0s and 1s How do we encode strings as sequence o bits? Must be invertible (one-to-one) What to use as ew bits as possible One approach: choose encoding or characters, induce encoding o strings by concatenating codes or each character

How to Encode Obvious solution: I alphabet size is 2 k or some k, encode each character using k bits Each character takes k bits n characters kn bits total Letter Encoding A 00 B 01 C 10 D 11 ABACBDAAADBAC 00010010011100000011010010

How to Encode Issues: Wasteul: I not exactly 2 k characters, some sequences never used Letter A 00 B 01 C 10 Encoding Never use 11

How to Encode Issues: What i one character occurs very oten? AAAAAAABAAACAABAADAAAAAAACAAAB I almost all letters are A s, then an encoding that uses ewer bits to represent A and more to represent everything else would save on space

Variable Length Encoding Variable Length Encoding = encoding o characters as bits where dierent letters may use a dierent number o bits Still need encoding on strings to be one-to-one. What does this say about the encoding or characters?

Variable Length Encoding Letter Encoding A 0 B 01 C 10 D 11 AC 010 BA 010 Not one-to-one!

Preix-Free Encoding A preix o a bit sequence is the irst i bits, or some i 0100101101000110101 0 01 010 0100 01001

Preix-Free Encoding A preix-ree encoding is an encoding o an alphabet such that no encoding o any character is a preix o the encoding o any other character Letter Encoding A 0 B 01 C 10 D 11 The encoding o A is a preix o the encoding o C

Preix-Free Encoding A preix-ree encoding is an encoding o an alphabet such that no encoding o any character is a preix o the encoding o any other character Letter Encoding A 0 B 10 C 110 D 111

Preix-Free Encoding Theorem: Any preix-ree encoding o an alphabet induces a one-to-one encoding o strings over that alphabet

Preix-Free Encoding Proo: Suppose toward contradiction that S and T are two dierent strings that map to the same sequence o bits Assume w.l.o.g. that S and T dier on the irst character. Let c be the irst character o S, d the irst character o T. Let E(c) and E(d) be the encodings o c and d Assume w.l.o.g. E(c) E(d)

Preix-Free Encoding Since all bits in encodings o S and T are the same, the irst E(d) bits are Thereore, the irst E(d) bits o E(c) are equal to E(d) E(d) is a preix o E(c) Since c was assumed dierent rom d, our encoding is not preix-ree.

Tree View o Preix-Free Encoding Every node represents a partial codeword Every node has two children, one or appending 0 to the partial codeword, one or appending 1. Leaves correspond to actual codewords Root is empty

Tree View o Preix-Free Encoding 0 1 A:0 1 0 1 0 10 1 B:11 C:100 D:101

Tree View o Preix-Free Encoding To encode: Find path rom root to character, concatenate edge labels To decode b 1 b 2 : Starting rom the root, ollow edge labeled b 1, then edge labeled b 2, until we ind a lea. Output that character, and start over rom the root

Optimal Encoding What is the best possible preix-ree encoding we can ind? Let n be the length o the string Let C be the cost o the encoding, deined as (length o encoding)/n C = average length o encoding o characters, weighted by requency

Optimal Encoding Let l i be the length o the encoding o character i Let i be the requency i occurs in the string i (number o instances o i)/n C = i i l i

Optimal Encoding l i is also the depth o character i in the encoding tree. Optimal encoding is always a ull binary tree I there is a node with only 1 child, replace node with child. Depth o leas only decreases.

Optimal Encoding Entropy: H = i Theorem (Shannon Coding Theorem): C H i log i

Proo O Coding Theorem Let g(x) = x log x Lemma: g( (x+y)/2 ) ( g(x)+g(y) )/2

Proo O Coding Theorem True when only 2 characters Only possible encoding is or each character to get 1 bit. C = 1 H g( 1) g( 2) 1 + 2 = log 1 2 log 2 = 2 2 g = 2g(1/ 2) 2 2 1 = 1

Proo o Coding Theorem Inductively assume true or m-1 characters Let T be the tree corresponding to an optimal encoding over some alphabet o m characters At least two leas at bottom level. Assume w.l.o.g. these correspond to characters 1 and 2 Replace all instances o characters 1 and 2 with a new character Has requency 1 + 2

Proo o Coding Theorem Now we have an alphabet o size m-1 Encoding or alphabet: start with T delete the nodes corresponding to characters 1 and 2 Assign the new character to the parent o these nodes (which is now a lea) New character has code length 1 less than deleted characters

Proo o Coding Theorem How does C change? Removed character 1 with length l, requency 1 Removed character 2 with length l, requency 2 Added new character, length l-1, requency 1 + 2 C = i i l i C' = C ( 1 + 2) l + ( 1 + 2)( l 1) = C ( 1 + 2)

Proo o Coding Theorem By inductive assumption, Recall ) )log( ( log ' 'log ' ' 2 1 2 1 3 H C i i i i i + + = = ) )log( ( log log ) )log( ( log log log 2 1 2 1 2 2 1 1 2 1 2 1 2 2 1 1 H i i i + + + + = + + + + = 2 1 ' C C + + =

Proo o Coding Theorem ( ) 1 ) log( ) ( log log 2 1 2 1 2 2 1 1 + + + + H C H g g g H H + + + = + + + + = 2 ) ( 2 1 ) ( 2 1 2 2 )log ( log log 2 1 2 1 2 1 2 1 2 2 1 1

How to Find Optimal Encoding Claim 1: There is an optimal solution where the two least requent characters have the longest codewords (i.e. lowest level o tree), and are identical except or last bit I not, swap these two characters with two o the characters with the longest codewords Can swap with two that are siblings

How to Find Optimal Encoding Assume the two lowest-requency characters are 1 and 2. What i we merge the two characters into a new character with requency 1 + 2? New character gets codeword obtained by dropping last bit o the codewords or 1 or 2

Merging Two Characters 0 1 A:0 1 0 1 0 10 1 B:11 0 100 1 D:101 C:1000 E:1001

Merging Two Characters 0 1 A:0 1 0 1 0 10 1 B:11 CE:100 D:101

How to Find Optimal Encoding Claim 2: For any optimal encoding, the encoding obtained by merging characters 1 and 2 must be an optimal encoding or the reduced alphabet, where characters 1 and 2 are replaced with a new character o requency 1 + 2

How to Find Optimal Encoding Character Frequency Codeword A 1 0 B 2 11 C 3 1000 D 4 101 E 5 1001 Character Frequency Codeword A 1 0 B 2 11 CE 3 + 5 100 D 4 101

How to Find Optimal Encoding Idea: Take two characters with lowest requency Merge them Recursively solve reduced problem Split characters apart again

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 C 0.10 D 0.15 E 0.05

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 C 0.10 D 0.15 E 0.05

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 [CE] 0.15 D 0.15

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 [CE] 0.15 D 0.15

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 [[CE]D] 0.30

How to Find Optimal Encoding Character Frequency Codeword A 0.45 B 0.25 [[CE]D] 0.30

How to Find Optimal Encoding Character Frequency Codeword A 0.45 [[[CE]D]B] 0.55

How to Find Optimal Encoding Character Frequency Codeword A 0.45 [[[CE]D]B] 0.55

How to Find Optimal Encoding Character Frequency Codeword [A[[[CE]D]B]] 1.00

How to Find Optimal Encoding Character Frequency Codeword [A[[[CE]D]B]] 1.00 [A[[[CE]D]B]]

How to Find Optimal Encoding Character Frequency Codeword A 0.45 0 [[[CE]D]B] 0.55 1 A 0 1 [[[CE]D]B]

How to Find Optimal Encoding Character Frequency Codeword A 0.45 0 B 0.25 11 [[CE]D] 0.30 10 A 0 1 0 1 [[CE]D] B

How to Find Optimal Encoding Character Frequency Codeword A 0.45 0 B 0.25 11 [CE] 0.15 100 D 0.15 101 A 0 1 0 1 0 1 B [CE] D

How to Find Optimal Encoding Character Frequency Codeword A 0.45 0 B 0.25 11 C 0.10 1000 D 0.15 101 E 0.05 1001 A 0 1 0 1 0 1 B 0 1 D C E

How to Find Optimal Encoding Let q be a heap o characters, ordered by requency For each character c, q.insert(c) While q has at least two characters: c 1 = q.deletemin(), c 2 = q.deletemin() Create a node labeled [c 1 c 2 ] with children c 1 and c 2 ([c 1 c 2 ] ) = (c 1 ) + (c 2 ) q.insert ([c 1 c 2 ] ) Return q.deletemin()

Running Time n inserts initially: O(n log n) Every run o loop decreases size o heap by 1 n-1 runs o loop Each run o loop involves 3 heap operations: O(log n) Total running time: O(n log n)

Set Cover Given a set o elements B, and a collection o subsets S i, output a selection o the S i whose union is B, such that the number o subsets used is minimal.

Example: Schools Suppose we have a collection o towns, and we want to igure out the best towns to put schools Need at least one school within 20 miles o each town Every school should be in a town

Example: Schools B = set o towns S i = subset o towns within 20 miles o town i

Greedy Solution Obvious solution: repeatedly pick the set S i with the largest number o uncovered elements.

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {} Greedy Algorithm Elements let: {1, 2, 3, 4, 5, 6}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {S 1 } Greedy Algorithm Elements let: {4, 5, 6}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {S 1, S 2 } Greedy Algorithm Elements let: {5, 6}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {S 1, S 2, S 3 } Greedy Algorithm Elements let: {6}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {S 1, S 2, S 3, S 4 } Greedy Algorithm Elements let: {}

Example B = {1, 2, 3, 4, 5, 6} S 1 = {1, 2, 3} S 2 = {1, 4} S 3 = {2, 5} S 4 = {3, 6} Sets used: {S 1, S 2, S 3, S 4 } Greedy Algorithm Optimal: { S 2, S 3, S 4 } Elements let: {}

Set Cover Greedy algorithm isn t optimal! Obtaining optimal solution believed hard Settle or approximation: I optimal uses k sets, want to get solution using only slightly more than k sets

Approximation Claim: I B contains n elements, and the optimal solution uses k sets, then greedy uses at most k ln n sets

Proo Let n t be the number o uncovered elements ater t iterations o greedy algorithm (n 0 = n) Remaining elements covered by the optimal k sets Must be some set with at least n t /k o the uncovered elements Thereore, greedy picks a set that covers at least n t /k o the remaining elements

Proo Greedy picks a set that covers at least n t /k o the remaining elements n t+1 n t - n t /k = n t (1-1/k) Thereore, n t n 0 (1-1/k) t = n(1-1/k) t

Proo Fact: 1-x e -x, with equality i and only i x = 0

Proo n t n(1-1/k) t < n(e -1/k ) t < ne -t/k Ater t = k ln n iterations, n t < n e -ln n = 1 Thereore, ater t = k ln n iterations, n t = 0 Thereore, greedy algorithm uses at most k ln n sets, as desired

Can We Do Better Our algorithm achieves an approximation ratio o ln n This gives two questions: Can the analysis be tightened so that greedy achieves a better approximation ratio? Are there more sophisticated algorithms that achieve better approximation ratio? Answer to both: most likely not I domr eicient algorithm can do much better, than we can solve a whole host o very diicult problems eiciently