Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Similar documents
4 Greedy Approximation for Set Cover

Lecture 7. s.t. e = (u,v) E x u + x v 1 (2) v V x v 0 (3)

Approximation Algorithms

Theorem 2.9: nearest addition algorithm

Approximation Algorithms

Greedy Approximations

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

Coverage Approximation Algorithms

CSC 373: Algorithm Design and Analysis Lecture 3

Approximation Algorithms

8 Matroid Intersection

Lecture #7. 1 Introduction. 2 Dijkstra s Correctness. COMPSCI 330: Design and Analysis of Algorithms 9/16/2014

Approximation Basics

Notes on Minimum Cuts and Modular Functions

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

Approximation Algorithms

CS270 Combinatorial Algorithms & Data Structures Spring Lecture 19:

Chapter 16. Greedy Algorithms

1 Minimum Cut Problem

Clustering: Centroid-Based Partitioning

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

11.1 Facility Location

Approximation Algorithms

1 Better Approximation of the Traveling Salesman

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Algorithm Design and Analysis

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Complexity Classes and Polynomial-time Reductions

9.1 Cook-Levin Theorem

Approximation Algorithms: The Primal-Dual Method. My T. Thai

CSE 202: Design and Analysis of Algorithms Lecture 4

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15

CS 6783 (Applied Algorithms) Lecture 5

1. Lecture notes on bipartite matching

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

Lecture 7: Asymmetric K-Center

Discharging and reducible configurations

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

1 Linear programming relaxation

Figure : Example Precedence Graph

Lecture Overview. 2 Shortest s t path. 2.1 The LP. 2.2 The Algorithm. COMPSCI 530: Design and Analysis of Algorithms 11/14/2013

11. APPROXIMATION ALGORITHMS

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

1 Variations of the Traveling Salesman Problem

Greedy algorithms is another useful way for solving optimization problems.

Submodular Optimization

Comp Online Algorithms

College of Computer & Information Science Fall 2007 Northeastern University 14 September 2007

11. APPROXIMATION ALGORITHMS

3 No-Wait Job Shops with Variable Processing Times

Algorithms, Games, and Networks February 21, Lecture 12

Solutions for the Exam 6 January 2014

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

Approximation Algorithms for Wavelength Assignment

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

Chapter 8 DOMINATING SETS

Absorbing Random walks Coverage

The complement of PATH is in NL

Module 6 NP-Complete Problems and Heuristics

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Lecture 8: The Traveling Salesman Problem

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Basic Combinatorics. Math 40210, Section 01 Fall Homework 4 Solutions

Matchings in Graphs. Definition 1 Let G = (V, E) be a graph. M E is called as a matching of G if v V we have {e M : v is incident on e E} 1.

Lecture 11: Maximum flow and minimum cut

1 Connected components in undirected graphs

CMSC 451: Lecture 22 Approximation Algorithms: Vertex Cover and TSP Tuesday, Dec 5, 2017

Partha Sarathi Mandal

Approximability Results for the p-center Problem

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Towards the Proof of the PCP Theorem

1 The Traveling Salesperson Problem (TSP)

Lecture 6: Faces, Facets

On Covering a Graph Optimally with Induced Subgraphs

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

Lecture 5: Properties of convex sets

Polynomial-Time Approximation Algorithms

Greedy algorithms Or Do the right thing

CSE 548: Analysis of Algorithms. Lecture 13 ( Approximation Algorithms )

Dijkstra s algorithm for shortest paths when no edges have negative weight.

CMPSCI611: Approximating SET-COVER Lecture 21

Chapter 6 DOMINATING SETS

Advanced Methods in Algorithms HW 5

15.4 Longest common subsequence

6 Randomized rounding of semidefinite programs

Graph Connectivity G G G

NP Completeness. Andreas Klappenecker [partially based on slides by Jennifer Welch]

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Module 6 NP-Complete Problems and Heuristics

11. APPROXIMATION ALGORITHMS

Chapter 8 DOMINATING SETS

Parameterized graph separation problems

15-854: Approximations Algorithms Lecturer: Anupam Gupta Topic: Direct Rounding of LP Relaxations Date: 10/31/2005 Scribe: Varun Gupta

Discrete mathematics

Problem Set 9 Solutions

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

{ 1} Definitions. 10. Extremal graph theory. Problem definition Paths and cycles Complete subgraphs

Transcription:

COMPSCI 632: Approximation Algorithms August 30, 2017 Lecturer: Debmalya Panigrahi Lecture 2 Scribe: Nat Kell 1 Introduction In this lecture, we examine a variety of problems for which we give greedy approximation algorithms. We first examine the classic set cover problem and show that the greedy algorithm achieves a O(logn) approximation. We then examine the maximum coverage and submodular maximization problems, and give (1 1/e)-approximate algorithms for both these problems. Finally, we examine two scheduling problems, precedence constraint scheduling and list scheduling, and give 2-approximate algorithms for both problems. 2 The Set Cover Problem The first problem we ll examine is the set cover problem, which is a generalization of the vertex cover problem we saw in the previous lecture. Set cover is defined as follows. Definition 1. SET COVER: As input, we are given a universe of n elements U and m subsets s 1,...,s m where each s i U has cost c i. The objective is to find a collection of subsets X of minimum cost that cover all elements in the universe, i.e., si Xs i = U. Since our goal is to cover all the elements with as a little cost as possible, a natural approach is to try the following greedy algorithm: On each step of the algorithm, let u i be the number of currently uncovered elements in set i. The algorithm selects the set that minimizes the ratio c i /u i. We keep selecting sets until all elements in the universe are covered. We now show that this algorithm is a O(logn) approximation, which turns out to be the best approximation one can achieve (unless P = NP). Theorem 1. The greedy algorithm for set cover is an O(logn) approximation. Proof. Consider a step of the algorithm. Let k be the number of uncovered elements before choosing a set on this step. Among the sets that have not yet been picked by the algorithm, the optimal solution uses some collection of these sets to cover the remaining elements. Let T opt be the set of indices of this collection. Let OPT be the cost of the optimal solution. Clearly, we have that i Topt c i OPT since these sets are a subset of the total collection used by the optimal solution. Also observe that we have i Topt u i k since by definition these sets cover the remaining k elements. (Note that this sum is at least k because it could be double counting elements that belong to multiple sets in T opt.) Therefore, we have: min c i /u i i T c opt i OPT i T opt i Topt u i k, (1) 2-1

where the first inequality follows from an averaging argument, and the second inequality follows from our above observations. Thus, on a step where there are k elements still uncovered, there must be a subset s i where its benefit ratio c i /u i is at most OPT/k. If on given step if the algorithm selects set s i, we will think of dividing up the cost of the set and charging it to each newly covered element, i.e., each newly covered element gets a charge c i /u i. We now claim that the jth element to be covered can receive a charge of at most OPT/(n j + 1). This follows from Eqn. (1) and the fact that when the jth element is covered, there must have been at least n j +1 uncovered elements on this step (also note that an element is only ever charged once when it is first covered by the algorithm). Clearly, the total cost of the the algorithm is the sum of the charges over all elements. Thus the total cost of the algorithm is: n n OPT n j + 1 = OPT 1 = OPT O(logn), j where the last equality follows from the fact that the nth harmonic number is O(logn). 3 Maximum Coverage and Submodular Maximization In this section, we first examine the maximum coverage problem. Maximum coverage is similar to set cover, except now instead of trying to cover all the elements with the smallest cost possible, we are given a parameter k and are asked to cover as many elements as possible with only k sets. Formally, maximum coverage is defined as follows. Definition 2. As input, we are given a universe of n elements U and m subsets s 1,...,s m where each s i U. The objective is to find a collection of subsets X such that X = k and the number of elements we cover, i.e., si X s i, is maximized. The algorithm we ll use for maximum coverage will also be similar to our greedy set cover algorithm. Namely, our algorithm will be to on each step, pick the set that covers the most uncovered elements (note that there is no notion of cost for this problem). We will show that this algorithm is (1 1/e)-approximate. Theorem 2. The greedy algorithm for maximum coverage is (1 1/e)-approximate. Proof. Let OPT be the number of elements covered in the optimal solution. Define l i to be the difference between OPT and the number of elements covered by the first i steps of the algorithm. Note that l 0, this difference before the start of the algorithm, is exactly OPT. Our first claim is that on the ith step of the algorithm, there exists at least one set that covers l i /k new elements. This follows from an argument similar to one used in Theorem 1: Let T be the set of currently uncovered elements in the algorithm s solution that are covered in the optimal solution. By definition, T is at least l i. Thus, since the optimal solution covers the elements in T with at most k sets, there must be set that covers at least T /k l i /k new elements. Hence, we have that for any i, l i l i+1 l i /k. Solving this recurrence we obtain l k (1 1/k) k l 0 = (1 1/k) k OPT (1/e) OPT. Therefore, the algorithm covers at least (1 1/e) OPT elements after selecting k sets. Maximum coverage is actually a special case of what is known as submodular maximization. To define this problem, we first give the definition of a submodular function. 2-2

Definition 3. Suppose we are given a finite universe U. A function f : 2 U R is said to be submodular if for all elements x U and all sets A,B 2 U such A B, we have: f (A {x}) f (A) f (B {x}) f (B). (2) Intuitively, a submodular function is one with diminishing returns, i.e., if we have a collection of elements and we add a new element to this collection, the gain in f is smaller than if we were to add the new element to a subset of this collection. For example, note that the coverage function, which was the objective we used for set cover and maximum coverage, is submodular. Here, our universe is the collection of sets s 1,...,s m and our function f maps to the size of the union of the collection of sets. Clearly, this function is submodular since when we add a set to the union of a collection of sets, the number of additional elements that we cover will be smaller than if we were to add the new set to a subset of the collection. The problem of finding a set A such that a submodular function f (A) is maximized is NP-hard, and the best known algorithm is a 1/2 approximation. However, if we also assume f is monotone, i.e., f (A {x}) f (A) for all sets A and all elements x, then we can achieve a (1 1/e) approximation like we did for the maximum coverage problem. In fact this algorithm is a natural generalization of our greedy maximum coverage algorithm: on each step, we add the element that maximizes the current value of our set. Theorem 3. The greedy algorithm for monotone submodular maximization is a (1 1/e)-approximation. Proof. We will use an argument similar to that used in Theorem 2. Let S i denote the set of elements chosen by algorithm after i steps of the algorithm, and let S be the set that maximizes f. Let l i be the difference between the value of these two sets, i.e., l i = f (S ) f (S i ). Our goal will be again to show that l i /k l i l i+1. Once we establish this, the proof becomes identical to that of Theorem 2. To show this, let Ki be the set of elements included in S but not in S after i steps. Observe that since f is submodular we have the following inequality: Ki Ki f (S i {y 1,...,y j }) f (S i {y 1,...,y j 1 }) f (S i {y j }) f (S i ), (3) where {y 1,...,y 0 } is defined as the empty set. Observe that the LHS of Eqn. (3) telescopes and therefore equals f (S S i ) f (S i ). Since f is also monotone, we have that f (S i S ) f (S ). Thus, Eqn. (3) implies that: K l i = f (S i ) f (S i ) f (S i {y j }) f (S i ) { f (Si {y j }) f (S i ) } Ki max j Ki k(l i l i+1 ), (4) where (4) follows from the fact that the algorithm chooses the element that increases the value of f by the most and K i k (the optimal solution can pick at most k elements). Thus, we have that l i /k l i l i+1, as desired. As we mentioned above, the rest of the proof is identical to that of Theorem 2. 2-3

4 Scheduling Problems To end the lecture, we will examine two classic scheduling problems, the first of which is known precedence constraint scheduling. We define this problem as follows: Definition 4. PRECEDENCE CONSTRAINT SCHEDULING: As input, we are given n jobs and k machines, where each job is of unit length. We are also given a DAG G on the set of jobs, which dictates the order in which jobs can be schedule, i.e., all ancestors of a job j must be processed before j can be processed. The objective is to minimize the makespan of the schedule, i.e., the completion of the job that finishes last. Our greedy algorithm for this problem is straightforward: On the ith step of the algorithm, let R i be the set of jobs that are ready to be scheduled (i.e., all jobs whose ancestors have already been processed). If the R i k, the algorithm simply schedules these jobs in parallel on an arbitrary set of R i machines. If R i > k, then we arbitrarily pick a subset of k jobs from R i and schedule them in parallel. We repeat this procedure until all jobs have been scheduled. We now show that this algorithm is a 2 approximation. Theorem 4. The greedy algorithm for precedence constraint scheduling is a 2 approximation. Proof. We will use the following two lower bounds on the optimal solution. The first lower bound is that the optimal schedule must be at least the length of the longest path p in G (which is immediate from the definition of the problem). The second lower bound is that the optimal solution must take time at least n/k to finish, since at any time step it can schedule at most k jobs. Consider a run of the algorithm. We call a time step full if the algorithm assigns jobs to all k machines; otherwise, we call it a non-full time step. First observe that by definition we can have at most n/k full time steps. Next, we claim that we can have at most p non-full time steps. To see this second claim, note that on ith step of the algorithm, we can think of the algorithm as having some remaining graph G i to process (i.e., the graph after removing all jobs that have been previously scheduled), where the jobs in R i correspond to the source nodes of this remaining graph. Since the longest path in G i must start at one of these source nodes, the longest path in G i+1 must be one less than that of G i after a non-full step (since on non-full steps we schedule all available jobs). Clearly, the value of the longest path in the remaining graph cannot increase on full time steps; thus, there can be at most p non-full time steps. Since the total steps taken by the algorithm is the sum of full and non-full steps, and the number of each respective step types lower bounds the makespan of the optimal schedule (discussed above), it follows that the algorithm s solution is 2-approximate. For our last problem, we consider the list scheduling problem. We define this problem as follows. Definition 5. LIST SCHEDULING: As input, we are given n jobs and k machines, where each job j has a specified processing time p j. The objective is to assign each job to a unique machine such that the total processing time of the machine that finishes last (i.e., the makespan of the schedule) is minimized. Again, our greedy algorithm for this problem is straightforward. We will simply scan the set of jobs in an arbitrary order, and then for each job we schedule it on the machine that currently has the smallest total processing time. Again, we show that this algorithm is 2-approximate. Theorem 5. The greedy algorithm for list scheduling is a 2 approximation. 2-4

Proof. We again use two lower bounds on the optimal makespan, which are similar to the two lower bounds used in Theorem 4. First, it must be the case that the optimal makespan is at least j p j /k since the sum of machine completion times must add up to the total processing time. Second, we have that for any job j, the optimal makespan must be at least p j. Consider the machine i that finishes last in the algorithm s schedule. Let j be the last job assigned to this machine. Observe that since we schedule each job on the machine that currently has the smallest load, all machines other than i must be busy when job j starts. This implies that the start time of j is no more than j p j /k. Therefore, since i is the machine that finishes last, the makespan of the schedule is at most j p j /k +p j. As we argued above, both of these terms are lower bounds on the optimal makespan. Thus, the algorithm is a 2 approximation. 2-5