Horn Formulae. CS124 Course Notes 8 Spring 2018

Similar documents
Greedy Algorithms CHAPTER 16

Copyright 2000, Kevin Wayne 1

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

What Can We Do? CS125 Lecture 20 Fall 2014

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Vertex Cover Approximations

Analysis of Algorithms - Greedy algorithms -

Algorithms Dr. Haim Levkowitz

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

8 Matroid Intersection

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

CSC 373: Algorithm Design and Analysis Lecture 4

TU/e Algorithms (2IL15) Lecture 2. Algorithms (2IL15) Lecture 2 THE GREEDY METHOD

Linear Time Unit Propagation, Horn-SAT and 2-SAT

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

4.1 Review - the DPLL procedure

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

Lecture: Analysis of Algorithms (CS )

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Text Compression through Huffman Coding. Terminology

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

16 Greedy Algorithms

Example: Map coloring

(a) (4 pts) Prove that if a and b are rational, then ab is rational. Since a and b are rational they can be written as the ratio of integers a 1

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

2SAT Andreas Klappenecker

Binary Decision Diagrams

Greedy Algorithms. Alexandra Stefan

Algorithms and Data Structures CS-CO-412

Discrete mathematics

Greedy Algorithms and Huffman Coding

looking ahead to see the optimum

CSC 373 Lecture # 3 Instructor: Milad Eftekhar

Intro. To Multimedia Engineering Lossless Compression

1. Suppose you are given a magic black box that somehow answers the following decision problem in polynomial time:

CSE 421 Greedy: Huffman Codes

Lossless Compression Algorithms

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Lecture 8: The Traveling Salesman Problem

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

February 24, :52 World Scientific Book - 9in x 6in soltys alg. Chapter 3. Greedy Algorithms

1 Linear programming relaxation

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Greedy algorithms part 2, and Huffman code

CS473-Algorithms I. Lecture 11. Greedy Algorithms. Cevdet Aykanat - Bilkent University Computer Engineering Department

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Lecture 19. Lecturer: Aleksander Mądry Scribes: Chidambaram Annamalai and Carsten Moldenhauer

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

NP-Complete Reductions 2

ARC 088/ABC 083. A: Libra. DEGwer 2017/12/23. #include <s t d i o. h> i n t main ( )

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

Algorithm 23 works. Instead of a spanning tree, one can use routing.

16.Greedy algorithms

Multi-Way Search Trees

Theorem 2.9: nearest addition algorithm

Fixed-Parameter Algorithm for 2-CNF Deletion Problem

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

AXIOMS FOR THE INTEGERS

FINAL EXAM SOLUTIONS

Lecture 20: Satisfiability Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

(Refer Slide Time: 01.26)

Steven Skiena. skiena

Paths, Flowers and Vertex Cover

CS-E3200 Discrete Models and Search

Some Hardness Proofs

Design and Analysis of Algorithms

6. Advanced Topics in Computability

Lecture 14: Lower Bounds for Tree Resolution

Data Compression Techniques

Greedy Algorithms II

CS 125 Section #4 RAMs and TMs 9/27/16

Parallel and Distributed Computing

Figure 4.1: The evolution of a rooted tree.

14.4 Description of Huffman Coding

The Satisfiability Problem [HMU06,Chp.10b] Satisfiability (SAT) Problem Cook s Theorem: An NP-Complete Problem Restricted SAT: CSAT, k-sat, 3SAT

Name: Lirong TAN 1. (15 pts) (a) Define what is a shortest s-t path in a weighted, connected graph G.

More Bits and Bytes Huffman Coding

(Refer Slide Time 3:31)

Lecture 3, Review of Algorithms. What is Algorithm?

1 Definition of Reduction

6.001 Notes: Section 31.1

Computability Theory

CS15100 Lab 7: File compression

CS 206 Introduction to Computer Science II

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

Discrete Mathematics for CS Spring 2008 David Wagner Note 13. An Introduction to Graphs

ABC basics (compilation from different articles)

We will show that the height of a RB tree on n vertices is approximately 2*log n. In class I presented a simple structural proof of this claim:

Lecture 3: Recursion; Structural Induction

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

In this lecture, we ll look at applications of duality to three problems:

Transcription:

CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it doesn t, it can provide a useful result. Horn Formulae A simple application of the greedy paradigm solves an important special case of the SAT problem. We have already seen that 2SAT can be solved in linear time. Now consider SAT instances where in each clause, there is at most one positive literal. Such formulae are called Horn formulae; for example, this is an instance: (x y z w) (x y w) (x z w) (x y) (x) (z) (x y w). Given a Horn formula, we can separate its clauses into two parts: the pure negative clauses (those without a positive literal) and the implications (those with a positive literal). We call clauses with a positive literal implications because they can be rewritten suggestively as implications; (x y z w) is equivalent to (y z w) x. Note the trivial clause (x) can be thought of as a trivial implication x. Hence, in the example above, we have the implications (y z w x),(x z w),(x y),( x),(x y w) and these two pure negative clauses (x y w),(z). We can now develop a greedy algorithm. The idea behind the algorithm is that we start with all variables set to false, and we only set variables to T if an implication forces us to. Recall that an implication is not satisfied if all variables to the left of the arrow are true and the one to the right is false. This algorithm is greedy, in the sense that it (greedily) tries to ensure the pure negative clauses are satisfied, and only changes a variable if absolutely forced. Algorithm Greedy-Horn(φ: CNF formula with at most one positive literal per clause) Start with the truth assignment t :=FFF F while there is an implication that is not satisfied do make the implied variable T in t if all pure negatives are satisfied then return t else return φ is unsatisfiable 8-1

Lecture 8 8-2 Once we have the proposed truth assignment, we look at the pure negatives. If there is a pure negative clause that is not satisfied by the proposed truth assignment, the formula cannot be satisfied. This follows from the fact that all the pure negative clauses will be satisfied if any of their variables are set to F. If such a clause is unsatisfied, all of its variables must be set to T. But we only set a variable to T if we are forced to by the implications. If all the pure negative clauses are satisfied, then we have found a truth assignment. On the example above, Greedy-Horn first flips x to true, forced by the implication x. Then y gets forced to true (from x y), and similarly w is forced to true. (Why?) Looking at the pure negative clauses, we find that the first is not satisfied, and hence we conclude the original formula had no truth assignment. Exercise: Show that the Horn-greedy algorithm can be implemented in linear time in the length of the formula (i.e., the total number of appearances of all literals). Huffman Coding Suppose that you must store the map of a chromosome which consists of a sequence of 130 million symbols of the form A, C, G, or T. To store the sequence efficiently, you represent each character with just 2 bits: A as 00, C as 01, G as 10, and T as 11. Such a representation is called an encoding. With this encoding, the sequence requires 260 megabits to store. Suppose, however, that you know that some symbols appear more frequently than others. For example, suppose A appears 70 million times, C 3 million times, G 20 million times, and T 37 million times. In this case it seems wasteful to use two bits to represent each A. Perhaps a more elaborate encoding assigning a shorter string to A could save space. We restrict ourselves to encodings that satisfy the prefix property: no assigned string is the prefix of another. This property allows us to avoid backtracking while decoding. For an example without the prefix property, suppose we represented A as 1 and C as 101. Then when we read a 1, we would not know whether it was an A or the beginning of a C! Clearly we would like to avoid such problems, so the prefix property is important. You can picture an encoding with the prefix property as a binary tree. For example, the binary tree below corresponds to an optimal encoding in the above situation. (There can be more than one optimal encoding! Just flip the left and right hand sides of the tree.) Here a branch to the left represent a 0, and a branch to the right represents a 1. Therefore A is represented by 1, C by 001, G by 000, and T by 01. This encoding requires only 213 million bits a 17% improvement over the balanced tree (the encoding 00,01,10,11). (This does not include the bits that might be necessary to store the form of the encoding!)

Lecture 8 8-3 0 1 (60) A (70) 0 1 (23) T (37) 0 1 G (20) C (3) Figure 8.1: A Huffman tree. Let us note some properties of the binary trees that represent encoding. The symbols must correspond to leaves; an internal node that represents a character would violate the prefix property. The code words are thus given by all root-to-leaf paths. All internal nodes must have exactly two children, as an internal node with only one child could be deleted to yield a better code. Hence if there are n leaves there are n 1 internal edges. Also, if we assign frequencies to the internal nodes, so that the frequencies of an internal node are the sums of the frequencies of the children, then the total length produced by the encoding is the sum of the frequencies of all nodes except the root. (A one line proof: each edge corresponds to a bit that is written as many times as the frequency of the node to which it leads.) One final property allows us to determine how to build the tree: the two symbols with the smallest frequencies are together at the lowest level of the tree. Otherwise, we could improve the encoding by swapping a more frequently used character at the lowest level up. (This is not a full proof; feel free to complete one.) This tells us how to construct the optimum tree greedily. Take the two symbols with the lowest frequency, delete them from the list of symbols, and replace them with a new meta-character; this new meta-character will lie directly above the two deleted symbols in the tree. Repeat this process until the whole tree is constructed. We can prove by induction that this gives an optimal tree. It works for 2 symbols (base case). We also show that if it works for n letters, it must also work for n+1 letters. After deleting the two least frequent symbols and

Lecture 8 8-4 replacing them with a meta-character, it as though we have just n symbols. this process yields an optimal tree for these n symbols (by the inductive hypothesis). Expanding the meta-character back into the two deleted nodes must now yield an optimal tree, since otherwise we could have found a better tree for the n symbols. A 60 A 60 A 60 [OA] 110 O A I 40 I 40 O 50 O 50 O 50 [I[UY]] 90 [I[UY]] 90 U 20 [UY] 50 I I Y 30 U Y U Y U Y Figure 8.2: The first few steps of building a Huffman tree. It is important to realize that when we say that a Huffman tree is optimal, this does not mean that it gives the best way to compress a string. It only says that we cannot do better by encoding one symbol at a time. By encoding frequently used blocks of letters (such as, in this section, the block frequen ) we can obtain much better encodings. (Note that finding the right blocks of letters can be quite complicated.) Given this, one might expect that Huffman coding is rarely used. In fact, many compression schemes use Huffman codes at some point as a basic building block. For example, image and video transmission often use Huffman encoding somewhere along the line. Exercise: Find a Huffman compressor and another compressor, such as a gzip compressor. Test them on some files. Which compresses better? It is straightforward to write code to generate the appropriate tree, and then use this tree to encode and decode messages. For encoding, we simply build a table with a codeword for each sybmol. To decode, we could read bits in one at a time, and walk down the tree in the appropriate manner. When we reach a leaf, we output the appropriate symbol and return to the top of the tree.

Lecture 8 8-5 In practice, however, if we want to use Huffman coding, there are much faster ways to decode than to explicitly walk down the tree one bit at a time. Using an explicit tree is slow, for a variety of reasons. Exercise: Think about this. One approach is to design a system that performs several steps at a time by reading several bits of input and determining what actions to take according to a big lookup table. For example, we could have a table that represents the information, If you are currently at this point in the tree, and the next 8 bits are 00110101, then output AC and move to this point in the tree. This lookup table, which might be huge, encapsulates the information needed to handle eight bits at once. Since computers naturally handle eight bit blocks more easily than single bits, and because table lookups are faster than following pointers down a Huffman tree, substantial speed gains are possible. Notice that this gain in speed comes at the expense of the space required for the lookup table. There are other solutions that work particularly well for very large dictionaries. For example, if you were using Huffman codes on a libraray of newspaper articles, you might treat each work as a symbol that can be encoded. In this case, you would have a lot of symbols! We will not go over these other methods here; a useful paper on the subject is On the Implementation of Minimum-Redundancy Prefix Codes, by Moffat and Turpin. The key to keep in mind is that while thinking of decoding on the Huffman tree as happening one bit at a time is useful conceptually, good engineering would use more sophisticated methods to increase efficiency. The Set Cover Problem The inputs to the set cover problem are a finite set X ={x 1,...,x n }, and a collection of subsets S of X such that S S S=X. The problem is to find the subcollection T S such that the sets of T cover X, that is Notice that such a cover exists, since S is itself a cover. T = X. T T The greedy heuristic suggests that we build a cover by repeatedly including the set in S that will cover the maximum number of as yet uncovered elements. In this case, the greedy heuristic does not yield an optimal solution. Interestingly, however, we can prove that the greedy solution is a good solution, in the sense that it is not too far from the optimal. This is an example of an approximation algorithm. Loosely speaking, with an approximation algorithm, we settle for a result that is not the correct answer. Instead, however, we try to prove a guarantee on how close the algorithm is to the right answer. As we will see later in the course, sometimes this is the best we can hope to do.

Lecture 8 8-6 Claim 8.1 Let k be the size of the smallest set cover for the instance (X,S). Then the greedy heuristic finds a set cover of size at most k lnn. Proof: Let Y i X be the set of elements that are still not covered after i sets have been chosen with the greedy heuristic. Clearly Y 0 = X. We claim that there must be a set A S such that A Y i Y i /k. To see this, consider the sets in the optimal set cover of X. These sets cover Y i, and there are only k of them, so one of these sets must cover at least a 1/k fraction of Y i. Hence Y i+1 Y i Y i /k=(1 1/k) Y i, and by induction, Y i (1 1/k) i Y 0 =n(1 1/k) i < ne i/k, where the last inequality uses the fact that 1+x e x with equality iff x=0. Hence when i k lnn we have Y i <1, meaning there are no uncovered elements, and hence the greedy algorithm finds a set cover of size at most k lnn. Exercise: Show that this bound is tight, up to constant factors. That is, give a family of examples where the set cover has size k and the greedy algorithm finds a cover of size Ω(k lnn).