Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters

Size: px

Start display at page:

Download "Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters"

Marsha Hall
5 years ago
Views:

1 Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters Abstract Christos-Alexandros Psomas alexpsomi@gmail.com Jarett Schwartz jarett@cs.berkeley.edu Resource allocation is necessary for any application to be run on a computer system. Dominant Resource Fairness, and other recently proposed mechanisms, handle the problem of fair resource allocation in a datacenter containing different resource types. To date most haven t considered indivisible demands and none have considered clusters of machines. We analyze various resource allocation algorithms over datacenter clusters. The first part extends previous work on max-min fairness, while the second part corresponds to a complex Multidimensional Bin Packing Problem that has not been well explored.our proposed M ergedrf algorithm increases utilization without much loss to fairness compared to adaptations of algorithms from the resource allocation literature. 1 Introduction Within a datacenter, machines need to respond to requests for resources, such as CPU and memory from different users, each with unique tasks. These tasks may require different amounts and ratios of resources based on the size of the tasks and their application domain. The datacenter must devise a mechanism to allocate its resources to the users to satisfy two goals. First, it must be fair, or else the users who are not getting a fair share will go to another datacenter. It must also be efficient, so that the datacenter allocates many tasks, or else it will not be maximizing revenue. Since Ghodsi et al. introduced Dominant Resource Fairness (DRF) [4], there has been a flurry of research on algorithms that balance these two goals [3, 8, 5, 6]. DRF in particular achieves both goals by satisfying several game theoretic properties we will define in the following section. But, in order to satisfy these properties, DRF, and many of the other algorithms make assumptions that do not actually hold in a real datacenter. First, in the divisibility assumption, they assume that the demands from the users of the datacenter are divisible, meaning, for example, each user can complete 1 2 of a task if it is given 1 2 of its resource demand for that task. But, many processes do not yield any intermediary results, so the user will only gain utility if the entire task is completed. To our knowledge, only [8] has addressed this restriction. Secondly, in the single machine assumption, they assume that all of the resources are stored on a single monolithic disk. This is unrealistic, as a datacenter is often split up into many smaller machines, from dozens up to thousands. This storage model can be more cost efficient, as well as being more secure and robust to failures. To our knowledge, none of the work following DRF has been done has addressed resource allocation on these clusters. Our main constraint in this context is that each machine must schedule each task only using resources from a single machine in the cluster. We intend to explore mechanisms that emphasize fairness and efficiency under a model that does not make either of these assumptions. 1.1 Related Work The original DRF paper left dealing with indivisibilities as an open problem in its conclusion. To our knowledge, existing work has only addressed models that remove this, making the demands of users indivisible, without addresing clusters. Parkes et al.([8]) designed the SequentialM inm ax algorithm to handle this case, leaving removing the single machine assumption as an open problem. Other papers since DRF have addressed fairness with other modifications and in other application domains. Ghodsi et al. described a simliar algorithmic idea as applied 1

2 in network middleboxes in [3]. In Parkes et al. [8], in addition to discussing indivisibility, they generalize DRF and show that it satisfies more game-theoretic properties. Gutman and Nisan [5] gave an interpretation of DRF from a game theory and economics standpoint. Joe-Wong et al. [6] define a wide class of DRF-like mechanisms that trade off efficiency and fairness. Though we are interested in the same trade-off in efficiency and fairness as these papers, the elimination of the single machine assumption makes this highly dependent on the packing of the tasks into the machines, which makes these ideas insufficient. Instead it is necessary to look at ideas from the literature on bin packing. The Bin Packing problem is a classic NP-complete problem, classified by Garey and Johnson [2]. Our packing problem is a multidimensional bin packing problem, as it is over multiple resources. Most of the multidimensional bin packing literature deals with uniform size bins [7], but a few recent papers have also considered variable size bins [1]. We will more precisely describe the added complexity introduced by our problem beyond traditional bin packing in 2.1. The rest of this paper is organized as follows. In section 2 we formally define our problem. In 2.1 we talk about connections between our problem and Bin Packing. We developed several algorithms to attack our problem. We split the algorithms into two parts. First, we describe ways to determine fairness, in section 3 by generating an ordering of tasks to assign. Second, we describe ways to pack these tasks into the machines via packing heuristics, in section 4. We can choose both of these options independently, giving us a large class of algorithms. In section 5 we evaluate our algorithms on several different metrics. 2 Model and Definitions There is a set of users N = {1, 2,.., n} and a set of machines M = {1, 2,...m}. Each machine has k resources. So, every machine m i M can be represented by a real vector m i = (r i,1, r i,2,..., r i,k ). Every user i N has a demand vector u i = (u i,1, u i,2,..., u i,k ) which represents the amount of each resource user i needs in order to schedule one task. Our goal is to find an allocation A where A(i, j) denotes the number of resources machine j allocates to user i. Let F ( u i, m) equal the maximum value of integer C > 0 such that Cu i,j m j for all j. F gives us the maximum number of tasks u i can fit in a vector m. So, given an allocation A, then F ( u i, A(i, j)) is exactly the number of tasks user i will schedule on machine j. We define J(A) i as the total number of tasks user i will schedule using allocation A, that is J(A) i = j F ( u i, A(i, j)). We also define J(A) as the total number of tasks A schedules: J(A) = i N J(A) i. Dominant Resource Fairness (DRF) on a single machine gives as an allocation x = (x 1, x 2,..., x n ), where x i R is the number of tasks user i is able to schedule. DRF was analyzed based on satisfying multiple game theoretic properties. We will redefine these in our more general framework. DRF is based around the concept that each user i cares most about a particular Dominant Resource d(i) which is defined as the resource j that maximizes u i,j /m j, meaning the resource that the user demands in the highest fraction. Given an allocation A, the dominant share of a user i is J(A) i ( u i,d(i) ). The algorithm presented m d(i) for DRF was to solve the following optimization problem. Given that each user gets resources in proportion to their demands u i, choose the allocation A that maximize the minimum over dominant shares of the users. Definition 1. An allocation A satisfies Sharing Incentives (SI) if for each user i, the number of tasks he schedules using allocation A is larger than the number of tasks he would schedule if he was given 1 n of each resource of each machine. Formally, i N, J(A) i j F ( u m j i, n ), where mj n denotes dividing the m vector pointwise. Sharing Incentives basically says that each user is better when sharing the cluster than using her own equal partition of the cluster. Definition 2. An allocation A satisfies Envy Freeness (EF) if each user i N prefers his allocation to the other users allocations. An allocation A is r-envy Free (EFr) if each user i N does not envy user j N when r tasks of i are removed from each machine of tfhe allocation of j. Formally, EF means that i, k N, J(A) i j F ( u i, A(k, j)), and EFr is i, k N, J(A) i j F ( u i, A(k, j) r u i ). It is critical that we define Envy Freeness in this way and not as J(A) i F ( u i, j ), as we can give m j n 2

3 examples for which it is impossible to give an allocation that satisfies both Pareto-Optimal and EFr for all r = O(m): Consider a datacenter with n = m = j + 1. Let u 1 = {ɛ, ɛ} and u j = {1, 1} for all j > 1 and m j = {2 ɛ, 2 ɛ} for all j. Now, look at some Pareto- Optimal allocation for this instance. We know that each of the u j with {1, 1} demand vectors can be given at most one task on each machine. But, since we are Pareto- Optimal, this means that at least the remainder goes to u 1. So, u 1 gets at least {1 ɛ, 1 ɛ} allocated by each machine. So, it gets a total of at least {j jɛ, j jɛ} resources over all machines. But, to maximize the minimum allocation over all other u j, we must give each one task on a single machine. So, the minimum allocation is at most {1, 1}. So, u 1 has O(j) times as many resources, making the minimum u j O(j)-envious. So, due to this example, we must consider resources per machine, rather than the total over all machines when defining Envy Freeness. Definition 3. An allocation A is Strategy-Proof (SP) if i N, reporting u i is a dominant strategy, that is user i cannot schedule more tasks by lying about u i. Definition 4. An allocation A is Pareto-Optimal (PO) if there aren t enough unallocated resources for a user to allocate one extra task. That is, i N and j M, it is not possible to increase the value A(i, j) ( such that F ( u i, A(i, j)) increases ) and A is still feasible. We know from [8] that with indivisible demands, it is impossible to create a mechanism that is both Strategy- Proof and Pareto-Optimal or is both Strategy-Proof and Envy Free. These restrictions trivially hold in our case, since indivisible demands is a subproblem of our setup, with our cluster consisting of a single machine. Due to this restriction, and since we are focusing on revenue maximization, we emphasize satisfying Pareto-Optimal in our algorithms, rather than Envy Free or Strategy-Proof. 2.1 Bin Packing We want to relate our model to the classical bin packing algorithms found in the theory literature. There are some obvious differences, considering that bin packing is on identical unit machines and counts the number of bins, rather than the number of tasks scheduled or remaining room left on the machines. But, the performance of algorithms in one model is equivalent to the performance of the algorithm in a setting of the other model. So, in some sense these problems are equivalent. First, we define the Bin Packing problem: Given n tasks, each of which is a d dimensional vector n i = {n i,1, n i,2,..., n i,d }, and d dimensional bins of capacity B, find the minimum number of bins m, such that the tasks can be packed in the bins.assume we have two algorithms A and B such that for a given set of n tasks, A out performs B, meaning A packs in b A bins and B packs in b B bins and b A < b B. Then, we can construct an instance of our problem such that A outperforms B: Let our set of tasks be the same d dimensional vectors for each user and let us have b A machines of equal capacity B. Then, A can schedule n tasks into these machines. But, B cannot, or else it would have fit the original bin packing vectors in b A bins. We can also do this equivalence in the other direction. Assume we have algorithms A and B such that for a given number of identical machines m, and set N of users, A can schedule J(A) i tasks of user i, B can schedule J(B) i tasks of user i, and for every i J(A) i J(B) i and for at least one user j N J(A) j > J(B) j. Then, we can construct an instance of Bin Packing with the set of tasks equal to the set of J(A) tasks A scheduled. Then, A can still pack these in m bins, but B will be forced to use at least another bin, or else it would have scheduled J(A) tasks in the original problem. To be precise, our problem is a subproblem of the Maximum Cardinality Variable Sized Multidimensional Bin Packing Problem. The traditional bin packing problem tries to minimize the number of bins needed to pack some number of tasks. We, on the other hand, have a fixed number of bins, and want to maximize the space we fill on these bins, a Maximum Cardinality constraint. The Maximum Cardinality part could be replaced by Dual, which sometimes refers to counting tasks instead of bins, but in the literature Dual also refers to the Bin Covering problem, so we will refer to this as Maximum Cardinality. Traditional bin packing is also on a single dimension, but we have vectors of dimension equal to the number of resources. Finally, bin packing is usually on equally sized unit bins, while our bins have different sizes based on which machine holds more of each type of resource. While each of these modifications to bin packing has been 3

4 studied in isolation, there is no theoretical work on considering all of them at the same time, though the closest is likely work on Variable Sized Multidimensional Packing by Epstein and van Stee [1]. 3 Fairness Algorithms 3.1 DRF Separate The first most natural attempt at an algorithm based on DRF is DRF Separate. Run DRF on each machine in the cluster separately, round down the solution to integers. Then, give that number of tasks to each user on each machine. But, this algorithm performs very poorly, as for a large enough number of users N, it is not possible to give every user a task on each machine. So, it is possible that all allocations will round down to zero. For this reason, we omit DRF Separate in our simulations. 3.2 MergeDRF Algorithm 1 MergeDRF Input: Set of users N, set of machines M Output: Allocation A 1: Create machine R s.t. i, R i = j M rj,i 2: Solve DRF on R to get allocation x 3: Round down x and get allocation x s.t. x i 4: = xi while x i > 0 do x 5: Pick i N with probability i j N x j 6: Pick machine j (using a packing protocol) 7: A i,j A i,j + u i 8: x i x i 1 9: end while The first algorithm we will discuss is 1. MergeDRF creates a single machine by adding all the resource vectors of all the machines in the cluster, and solves DRF on this machine to get solution x. Based on this solution x, MergeDRF tries to pack x i tasks of user i in the cluster s machines. In order to do this, it takes a random ordering of the tasks given by DRF on the single machine and packs each task if possible, in order. In Figure 1 we see how MergeDRF works in a cluster with 2 machines m 1 = (4, 10) and m 2 = (5, 8) and users u 1 = (3, 1) and u 2 = (1, 4). First, we combine m 1 and m 2 to get machine R = (9, 18). DRF in R gives 2 tasks to u 1 and 3 tasks to u 2. We take a random order of these tasks and try to pack ( step 6 ) them into the original machines. The random ordering affects the outcome, but the important factor is the way we pack the tasks. We will talk more about this issue in 4. Let s examine Figure 1 again. The ordering on top gives a to u 2 after the first task to u 1. This precludes u 1 from scheduling more than one task. But, in the ordering on the bottom, u 1 schedules twice before u 2 can block it. Note that both allocations are Pareto-Optimal, and they are both worse than the solution of DRF on the single combined machine, even though we did not even round down. In general, however, if we run MergeDRF once, we may get an allocation that is not Pareto-Optimal, meaning that there possibly exists a user i for whom we can find a machine m j with enough unused resources to schedule another task, that is l {1,..., k}, m j,l u i,l. To achieve Pareto-Optimality we remove all the users that cannot allocate another task from the game and rerun M ergedrf with the remaining users. We repeat this procedure until no user can schedule another task. 3.3 Iterative DRF Algorithm 2 Iterative DRF Input: Set of users N, set of machines M Output: Allocation A 1: Create machine R = (R 1,..., R k ) s.t. R i = j M rj,i 2: i N find dominant resource d i of i on machine R 3: h i 0 N 4: S [N] 5: while S do 6: Pick i S with minimal h i 7: T {m j u i fits in m j} 8: if T then 9: Pick machine m j T with highest score 10: A i,j A i,j + u i 11: m j m j u i 12: h i h i + u di /R di 13: else 14: S S {i} 15: end if 16: end while Here, we describe a modification of the algorithm in the original DRF paper that accounted for indivisible demands. In this algorithm, each task is allocated to some 4

We call this algorithm IterativeDRF because we can think of it as a discrete time version of DRF.

5 Figure 1: MergeDRF at work machine that currently has the minimal dominant share. Parkes et al. [8] show that this is not Envy-Free or Pareto- Optimal in the indivisible case, but we modify it to work in the cluster settings, gaining Pareto-Optimality and losing Strategy-Proofness. We call this algorithm IterativeDRF because we can think of it as a discrete time version of DRF. We pick the user with the lowest dominant share and allocate him resources from the machine that yields the best fitting score, using a priority queue. We will talk about how this score is computed in 4. Since dominant share is defined on one machine in the original DRF paper, we use a big machine, which is created as in MergeDRF by merging all the cluster s machines, and compute dominant resource of each user there. If that user cannot fit on any machine, we remove him from the priority queue, and continue. This way, we are Pareto-Optimal, but after the first user is removed, we lose some of the fairness guarantees. 3.4 Sequential MinMax SequentialMinMax was described in [8] as an alternative to IterativeDRF on indivisible demands that Algorithm 3 Sequential Min-Max Input: Set of users N, set of machines M Output: Allocation A 1: Create machine R = (R 1,..., R k ) s.t. R i = j M rj,i 2: i N find dominant resource d i of i on machine R 3: h i 0 N 4: S [n] 5: a 0 6: while S do 7: R { u j max (a, h j + u dj /R j) is minimal} 8: for u i R do 9: if j(j(a) i + 1)( u i) > (J(A) j+1)( u j)or((j(a) i + 1)( u i) = (J(A) j+1)andj(a) j < J(A) i) then 10: R R u i 11: end if 12: end for 13: Pick u i R 14: T {m j u i fits in m j } 15: Pick machine j T with highest score 16: A i,j A i,j + u i 17: m j m j u i 18: for i S do 19: if u i Fits on no machine then 20: S S {i} 21: end if 22: end for 23: end while 5

6 is Pareto-Optimal. The basic idea is that the ordering of tasks is different from that given by the priority queue in IterativeDRF. Instead, a task is selected that minimizes the new maximum dominant share over resources. To break ties between these choices, they construct an Envy Graph to determine which user is the least envious and should go next. Though we use the same idea and ordering, we avoid building this entire graph in our implementation. This may make this algorithm more efficient when the number of users is quite large. Note that when we say v > u that this is defined point wise. SequentialM inm ax and IterativeDRF seem very similar at first glance. Here is an example, illustrated in 2, where we can see their difference: Consider two machines m 1 = {50, 10} and m 2 = {10, 50} such that u 1 = {50, 2} and u 2 = {1, 5}. IterativeDRF with WorstFit can pick either user for the first task, as both have 0 dominant share at the start. If it picks u 1, then its task only fits in m 1, and the rest of the tasks go to u 2, resulting in a good allocation of one task to user 1 and ten to user 2. Both get 5/6 dominant share. If it gives the first task to u 2, then WorstFit will put this task in machine 1. Then, for the next task, u 1 will be scheduled in m 1, and we are equivalent to the previous case. So, again we get a good packing. But, if we use SequentialMinMax with WorstFit, we get a very different ordering of tasks. Since SequentialM inm ax minimizes the new maximum dominant share, it will give the first 9 tasks to u 2, or else we will schedule u 1 and get a big maximum dominant share of 5/6. But, WorstFit will not put all of these 9 in m 2. The score for m 2, after it has taken t of u 2 s tasks is 10 t (t + 1) 50, while the score for an empty m 1 is 50 1 But, for t = 2, 10 t (t + 1) > = So, the third task will be allocated to m 1. But, then u 1 cannot schedule any tasks. So, this small example likely explains the disparity between SequentialM inm ax and IterativeDRF. Figure 2: IterativeDRF vs SequentialMinMax 4 Fitting Algorithms Given a task, or a number of tasks, there are different ways one can try to fit them to the machines. Here we describe the methods we tried. We also describe them in terms of a score function score : (M, N) R, that ranks the machines in order of preference for a given user. First, we used FirstFit, that is allocating to the first machine that user i can allocate a task. Our score function here is score( m j, u) = j Another natural idea is RandomFit, allocating the task to a random machine that has enough room. The score function score( m, u) U[0, 1], that is a random number in [0, 1]. Another idea, inspired by the literature, is BestFit. The score function we used is: score( m, u) = j {1,...,k} m j u j m j We tried other variants, such as normalizing by the original capacity, but we achieved the best performance with this function. BestFit chooses the machine that minimizes this function. On the other hand, WorstFit chooses the machine that maximizes this function. Note that our score function reduces to the traditional WorstFit and BestFit heuristics if we look at 1 dimension and uniform sized machines. In one dimension, the traditional score function equal to s = m j u j, which we 6

try to maximize/minimize. Thus, we maximize/minimize m j. But, our score function s gives (m j u j )/m j = 1 u j /m j. So, we still maximize/minimize m j.

In our artificial benchmarks, we made several assumptions. First, as was used in the analysis of DRF, we know that each user can fit at least one task on every machine.

Furthermore, on average, we expect every user to have been able to schedule about 20 tasks on each machine, if he were the only user in the system.

7 try to maximize/minimize. Thus, we maximize/minimize m j. But, our score function s gives (m j u j )/m j = 1 u j /m j. So, we still maximize/minimize m j. Thus, they are equivalent on traditional single dimensional bin packing. 5 Evaluation In order to test the algorithms, we set up several simulators. In our artificial benchmarks, we made several assumptions. First, as was used in the analysis of DRF, we know that each user can fit at least one task on every machine. To model the size of currently existing datacenters, we set the number of users to be around 50, and the number of machines ranging from 100 to 1000 machines. Furthermore, on average, we expect every user to have been able to schedule about 20 tasks on each machine, if he were the only user in the system. These assumptions assure that we won t wind up in degenerate cases where no machine can schedule any tasks due to the indivisibilities. We have implemented M ergedrf and IterativeDRF with various packing protocols, as well as SequentialM inm ax and DRF Separate, that is running DRF on each machine and rounding down. We compare these algorithms in terms of time and resource utilization. We omit DRF Separate when its bad performance makes the comparison of the rest of the algorithms too hard, in the graphs. Figure 3: Resource utilization across different fittings in IterativeDRF 5.1 Fittings In 3 we can see how these different fitting methods compare in terms of resource utilization, when using the IterativeDRF algorithm. Each algorithm (including M ergedrf ), had very similar results, meaning the corresponding graphs for each look almost identical. WorstFit performs the best for every algorithm, so when we perform our tests on the algorithms in the next section, we will exclusively use WorstFit. This data runs counter to the common intuition that BestFit is a good choice of a heuristic. In the single dimensional traditional bin packing problem, its competitive ratio is 1.5, and WorstFit is only 2 competitive. But, in our particular scenario, we can show why WorstFit performs better. Figure 4: WorstFit vs BestFit In figure 4 we get an intuition to why BestFit is a bad algorithm for packing our tasks in machines, and why WorstFit is a good one: if we have two machines m 1 = (25, 5) and m 2 = (5, 25) and two users u 1 = (1, 5) and u 2 = (5, 1), given a task of u 1, BestFit will allocate it to m 1 and completely saturate resource 1, and will allocate tasks of u 2 to m 2, saturating resource 2, thus it can schedule at most a total of 2 tasks, one for each user. On the other hand, WorstFit schedules tasks of u 1 to m 2 and tasks of u 2 to m 1, thus allowing for a total of 10 tasks 7

Similarly, WorstFit actively avoids filling up bottlenecks, so it performs the best.

8 (a) Leftover Resources (b) Min Dominant Shares (c) Worst case distribution to be scheduled, 5 for each user, leaving no unused resources. Since BestFit adversarily chooses machines that will fill up bottlenecked resources as in our example, it is not surprising it performs the worst. Similarly, WorstFit actively avoids filling up bottlenecks, so it performs the best. On the other hand, FirstFit and RandomFit do not use any information about the room left on each machine, so they perform somewhere close to the average between BestFit and WorstFit. 5.2 Fairness vs. Utilization For the manager of a datacenter, resource utilization will be the primary concern as maximizing the number of resources allocated is the same as maximizing revenue. We should note that being Pareto-Optimal is not enough. Just because another task cannot be scheduled without removing one, doesn t mean that the packing is efficient. In fact, all of our algorithms are Pareto-Optimal, so the differences in allocations are only a function of how filled each machine is. So, this also directly measures the efficiency of our packing algorithms. In 5a we can see how out algorithms compare in terms of unused resources. The leftover resources are calculated by taking the summing the fraction of each resource left over all machines. On the other hand, in order to measure how fair each algorithm is, we measure the minimum dominant share over all users. This serves as a direct comparison to the original DRF algorithm, as it aimed to maximize this value. So, we want to measure the tradeoff between these two metrics. We can always make this sort of tradeoff, for example by giving all tasks to the user with the smallest demand vector, we can get an algorithm with much higher total resource utilization, but 0 minimum dominant share, a decrease in 100 percent in fairness. On our random dataset, IterativeDRF and SequentialM in M ax perform about the same, though close examination shows that IterativeDRF has slightly fewer leftovers than SequentialM in M ax. The bigger difference is between M ergedrf and the other two algorithms. M ergedrf trades off better utilization (fewer leftover resources) for a smaller minimum dominant share, meaning lower fairness. In all of our trials, MergeDRF results in about a 14 percent drop in leftovers as seen in Figure 5a. We can see that there is not much change in minimum dominant share in Figure 5b, which has the size of machines set to about 20 times the average demand size. We tried varying the number of resources, number of machines, and size of machines, to see if our particular distribution gave us this result. Surprisingly, over all distributions, M ergedrf continued to have about 15 percent fewer leftovers. However, in the worst case, as shown in Figure 5c, the minimum dominant share also dropped by about 15 percent. But, generating this example meant we had to remove our assumption that in expectation, the machines can hold about 20 tasks and make the machines much larger (to hold more than 100 tasks on average). So, in cases where we have many small machines, M ergedrf is the natural choice. But, when we have large machines, we are modeling closer to the single machine assumption, 8

9 so the algorithms designed for a single machine perform better. 5.3 Time Since we want these algorithms to run without taking up too much of the resources we are trying to allocate, we want to ensure that the running time of each algorithm is not too large. In 5 we see how the algorithms behave as the number of machines increases. we care about where the cluster has many machines, we don t sacrifice much in time to get better utilization. 6 Conclusion So, our results point to two major observations about indivisible resource allocation on clusters. First, the packing problem it introduces is quite complex, but WorstFit performs the best of the commonly used heuristics, contrary to intuition from the traditional single dimensional problem. First, we can trade off time and a little fairness in MergeDRF to increase utilization when compared to our extensions to the existing algorithms for a single cluster (IterativeDRF and SequentialM inm ax). This suggests that if the managers make the divisibility and single machine assumptions, they may be losing out on revenue due to additional leftover resources. 6.1 Future Work Figure 5: Running time as number of machines increases Our time data fits our intuitions about the speed of our algorithms. DRF Separate solves m Linear Programs, meaning that it takes by far the longest, and it does scale well with increasing m. IterativeDRF is quite fast as it only needs to update a priority queue with n elements. MergeDRF can be forced to solve n LPs, in the worst case, if we remove exactly one user every time we scan through our ordering, but this case would require a very specific random ordering tied to a particular partitioning of the vertices. There is also a startup cost to starting the LP solver package, so MergeDRF has big running time for a small number of machines. But, as the number of machines increases towards m = 2000, the number of LPs is still some function of n, so, MergeDRF takes slightly more time than IterativeDRF, and is faster than SequentialM inm ax. This could also suggest that the bad cases that solve n different LPs are less likely to occur as the number of machines increases. So, in the scenario Though our simulations modeled a real datacenter, we are currently gaining access to a Facebook trace dataset which we expect to further exhibit the differences between the algorithms on a real workload. Though our heuristics have done quite well, we have only extended the fitting algorithms that are already common in the literature for the one dimensional bin packing problem without much analysis. We hope to prove that WorstFit gives provably good expected performance given our distributions. Since our problem is multidimensional and on variable sized machines, it may be that a different heuristic performs much better. We plan to explore the space of fitting algorithms more carefully, as they seem to affect utilization more than the small changes we saw in the fairness algorithms. We would also like to prove results about the fairness of MergeDRF. Though we cannot get Strategy-Proofness, it may be possible to get an approximate fairness guarantee in expectation due to our random ordering. Our algorithms assumed that there was centralized computation that determined the allocation and relayed it to the different machines. However, to more closely model a cluster, it may be possible to split up not just the resources among the machines, but also the computation. We believe IterativeDRF would be easy to write 9

10 as a distributed algorithm, but maintaining the random order and rerunning the LPs in MergeDRF seems nontrivial. When designing the distributed versions of these algorithms, we would also like to explore how we can take advantage of locality in the datacenter, allowing each user to have a different score function based on locality to each machine. This would help use the advantages a cluster gives us vs. a single machine to offset the added complexity in our algorithms relative to the original DRF. Note that having separate clusters without indivisibility, and without the main constraint, the DRF solution can be split among the different clusters in proportion to the cluster sizes. But, if we have separate clusters, and include the main constraint that the amount of resources allocated from each machine is proportional to the demand vector, then we get a different allocation problem. While we don t believe this models a real system as accurately, it would be interesting to see how this differs from the case with indivisibility, possibly allowing for a Pareto-Optimal + Strategy-Proof solution. 7 Acknowledgements We would like to thank Ali Ghodsi for his guidance and helpful discussions. We d also like to thank Prof. Kubiatowicz and Prof. Joseph for their useful comments. fair allocation of multiple resource types. In USENIX NSDI, [5] A. Gutman and N. Nisan. Fair allocation without trade. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages International Foundation for Autonomous Agents and Multiagent Systems, [6] C. Joe-Wong, S. Sen, T. Lan, and M. Chiang. Multiresource allocation: Fairness-efficiency tradeoffs in a unifying framework. In INFOCOM, 2012 Proceedings IEEE, pages IEEE, [7] R. Karp, M. Luby, and A. Marchetti-Spaccamela. A probabilistic analysis of multidimensional bin packing problems. In Proceedings of the sixteenth annual ACM symposium on Theory of computing, pages ACM, [8] D. Parkes, A. Procaccia, and N. Shah. Beyond dominant resource fairness: extensions, limitations, and indivisibilities. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages ACM, References [1] L. Epstein and R. van Stee. On variable-sized multidimensional packing. Algorithms ESA 2004, pages , [2] M. Garey and D. Johnson. Computers and intractability, volume 174. Freeman San Francisco, CA, [3] A. Ghodsi, V. Sekar, M. Zaharia, and I. Stoica. Multiresource fair queueing for packet processing. In Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, pages ACM, [4] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant resource fairness: 10

Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters

Beyond Beyond Dominant Resource Fairness : Indivisible Resource Allocation In Clusters Abstract Christos-Alexandros Psomas alexpsomi@gmail.com Jarett Schwartz jarett@cs.berkeley.edu Resource Allocation