Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensional Integrals

Size: px
Start display at page:

Download "Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensional Integrals"

Transcription

1 Optimization of Memory Usage and Communication Requirements for a Class of Loops Implementing Multi-Dimensional Integrals Chi-Chung LamÝ, Daniel CociorvaÞ, Gerald BaumgartnerÝ, P. SadayappanÝ ÝDepartment of Computer and Information Science The Ohio State University, Columbus, OH clam,gb,saday@cis.ohio-state.edu ÞDepartment of Physics The Ohio State University, Columbus, OH cociorva@pacific.mps.ohio-state.edu Abstract Multi-dimensional integrals of products of several arrays arise in certain scientific computations. To optimize the performance of such computations on parallel computers, the total amount of communication needs to be minimized, while staying within the available memory on each processor. In the context of these integral calculations, this paper addresses a communication minimization problem with memory constraint. We first investigate the memory usage minimization problem for sequential computers. Based on a framework that models the relationship between loop fusion and memory usage, we propose algorithms for finding loop fusion configurations that minimize memory usage under static and dynamic memory allocation models. We suggest ways to further reduce memory usage, when necessary, at the cost of increased arithmetic operations. A practical example shows the performance improvement obtained by our algorithms on an electronic structure computation. The algorithms are then extended to the context of parallel computers, to determine the appropriate loop fusions and array distributions that minimizes communication cost without exceeding the memory limit. Supported in part by the National Science Foundation under grant DMR

2 Keywords: memory usage optimization, communication optimization, loop fusion, multidimensional integral 2

3 1 Introduction This paper addresses the optimization of a class of loop computations that implement multidimensional integrals of the product of several arrays. Such integral calculations arise, for example, in the computation of electronic properties of semiconductors and metals [1, 7, 15]. The objective is to minimize the execution time of such computations on a parallel computer while staying within the available memory. In addition to the performance optimization issues pertaining to inter-processor communication and data locality enhancement, there is an opportunity to apply algebraic transformations using the properties of commutativity, associativity and distributivity to reduce the total number of arithmetic operations. Given a specification of the required computation as a multi-dimensional sum of the product of input arrays, we first determine an equivalent sequence of multiplication and summation formulae that computes the result using a minimum number of arithmetic operations. Each formula computes and stores some intermediate results in an intermediate array. By computing the intermediate results once and reusing them multiple times, the number of arithmetic operations can be reduced. In previous work, this operation minimization problem was proved to be NP-complete and an efficient pruning search strategy was proposed [10]. The simplest way to implement an optimal sequence of multiplication and summation formulae is to compute the formulae one by one, each coded as a set of perfectly nested loops, and to store the intermediate results produced by each formula in an intermediate array. However, in practice, the input and intermediate arrays could be so large that they cannot fit into the available memory. Hence, there is a need to fuse the loops as a means of reducing memory usage. By fusing loops between the producer loop and the consumer loop of an intermediate array, intermediate results are formed and used in a pipelined fashion and they reuse the same reduced array space. The problem of finding a loop fusion configuration that minimizes memory usage without increasing the operation count is not trivial. 3

4 In this paper, we develop an optimization framework that appropriately models the relation between loop fusion and memory usage. We present algorithms that find an optimal loop fusion configuration that minimizes memory usage, under both static and dynamic memory allocation models. We describe ways to tradeoff operation count for memory usage. The algorithms are applied on a practical physics computation to demonstrate their effectiveness. We then extend the framework to the parallel context, and propose two approaches for determining the distribution of data arrays and computations among processors and the loop fusions that minimize inter-processor communication for load-balanced parallel execution, while not exceeding the memory limit. Reduction of arithmetic operations has been traditionally done by compilers using the technique of common subexpression elimination [4]. Chatterjee et al. consider the optimal alignment of arrays in evaluating array expression on massively parallel machines [2, 3]. Much work has been done on improving locality and parallelism by loop fusion [8, 14, 16]. However, this paper considers a different use of loop fusion, which is to reduce array sizes and memory usage of automatically synthesized code containing nested loop structures. Traditional compiler research does not address this use of loop fusion because this problem does not arise with manually-produced programs. The contraction of arrays into scalars through loop fusion is studied in [6] but is motivated by data locality enhancement and not memory reduction. Loop fusion in the context of delayed evaluation of array expressions in APL programs is discussed in [5] but is also not aimed at minimizing array sizes. We are unaware of any work on fusion of multi-dimensional loop nests into imperfectly-nested loops as a means to reduce memory usage. The rest of this paper is organized as follows. Section 2 describes the operation minimization problem. Section 3 studies the use of loop fusion to reduce array sizes and presents algorithms for finding an optimal loop fusion configuration that minimizes memory usage. Section 4 considers the optimal partitioning of data and loops to minimize communication and computational costs for execution on parallel machines under memory constraint and describes two approaches to the 4

5 problem. Section 5 provides conclusions. 2 Operation Minimization In the class of computations considered, the final result to be computed can be expressed as multidimensional integrals of the product of many input arrays. Due to commutativity, associativity and distributivity, there are many different ways to obtain the same final result and they could differ widely in the number of floating point operations required. The problem of finding an equivalent form that computes the result with the least number of operations is not trivial and so a software tool for doing this is desirable. Consider, for example, the multi-dimensional integral shown in Figure 1(a). If implemented directly as expressed (i.e., as a single set of perfectly-nested loops), the computation would require Æ Æ Æ Æ Ð arithmetic operations to compute. However, assuming associative reordering of the operations and use of the distributive law of multiplication over addition is satisfactory for the floating-point computations, the above computation can be rewritten in various ways. One equivalent form that only requires Æ Æ Æ Ð Æ Æ Æ Æ operations is given in Figure 1(b). It expresses the sequence of steps in computing the multi-dimensional integral as a sequence of formulae. Each formula computes some intermediate result and the last formula gives the final result. A formula is either a product of two input/intermediate arrays or a integral/summation over one index, of an input/intermediate array. A sequence of formulae can also be represented as an expression tree. For instance, Figure 1(c) shows the expression tree corresponding to the example formula sequence. The problem of finding a formula sequence that minimizes the number of operations has been proved to be NP-complete [10]. A pruning search algorithm for finding such a formula sequence is given below. 5

6 Ï È Ðµ Ð Ð (a) A multi-dimensional integral È ½ Ð Ð Ð È Ð Ð ½ Ï È (b) A formula sequence for computing (a) È È ½ È Ð Ð (c) An expression tree representation of (b) Figure 1: An example multi-dimensional integral and two representations of a computation. 1. Form a list of the product terms of the multi-dimensional integral. Let denote the -th product term and dimens the set of index variables in. Set Ö and to zero. Set to the number of product terms. 2. While there exists an summation index (say ) that appears in exactly one term (say ) in the list and, increment Ö and and create a formula Ö È where Ö dimens dimens. Remove from the list. Append to the list Ö. Set to. 3. Increment Ö and and form a formula Ö where and 6

7 are two terms in the list such that and, and give priority to the terms that have exactly the same set of indices. The indices for Ö are Ö dimens dimens dimens. Remove and from the list. Append to the list Ö. Set to. Go to step When steps 2 and 3 cannot be performed any more, a valid formula sequence is obtained. To obtain all valid sequences, exhaust all alternatives in step 3 using depth-first search. 3 Memory Usage Minimization In implementing the computation represented by an operation-count-optimal formula sequence (or expression tree), there is a need to perform loop fusion to reduce the sizes of the arrays. Without fusing the loops, the arrays would be too large to fit into the available memory. There are many different ways to fuse the loops and they could result in different memory usage. This section addresses the problem of finding a loop fusion configuration for a given formula sequence that uses the least amount of memory. Section 3.1 introduces the memory usage minimization problem. Section 3.2 describes some preliminary concepts that we use to formulate our solutions to the problem. Sections 3.3 and 3.4 present algorithms for finding memory-optimal loop fusion configurations, under static memory allocation and dynamic memory allocation, respectively. Section 3.5 explores ways to further reduce memory usage at the cost of increasing the number of arithmetic operations. Section 3.6 illustrates how the application of the algorithms on an example physics computation improves its performance. 3.1 Problem Description Consider again the expression tree shown in Figure 1(c). A naive way to implement the computation is to have a set of perfectly-nested loops for each node in the tree, as shown in Figure 2(a). 7

8 The brackets indicate the scopes of the loops. Figure 2(b) shows how the sizes of the arrays may be reduced by the use of loop fusions. It shows the resulting loop structure after fusing all the loops between and ½, all the loops among,,, and, and all the loops between and. Here,,,,, and are reduced to scalars. After fusing all the loops between a node and its parent, all dimensions of the child array are no longer needed and can be eliminated. The elements in the reduced arrays are now reused to hold different values at different iterations of the fused loops. Each of those values was held by a different array element before the loops were fused (as in Figure 2(a)). Note that some loop nests (such as those for and ) are reordered and some loops within loop nests (such as the -, -, and Ð-loops for,, and ) are permuted in order to facilitate loop fusions. For now, we assume the leaf node arrays (i.e., input arrays) can be generated one element at a time (by the genú function for array Ú) so that loop fusions with their parents are allowed. This assumption holds for arrays in which the value of each element is a function of the array subscripts, as in many arrays in the physics computations that we work on. As will be clear later on, the case where an input array has to be read in or produced in slices or in its entirety can be handled by disabling the fusion of some or all the loops between the leaf node and its parent. Figure 2(c) shows another possible loop fusion configuration obtained by fusing all the -loops and then all the -loops and Ð-loops inside them. The sizes of all arrays except and are smaller. By fusing the -, -, and Ð-loops between those nodes, the -, -, and Ð-dimensions of the corresponding arrays can be eliminated. Hence,, ½,,, and are reduced to scalars while becomes a one-dimensional array. In general, fusing a Ø-loop between a node Ú and its parent eliminates the Ø-dimension of the array Ú and reduces the array size by a factor of Æ Ø. In other words, the size of an array after loop fusions equals the product of the ranges of the loops that are not fused with its parent. We only consider fusions of loops among nodes that are all transitively related by (i.e., form a transitive 8

9 Figure 2: Three loop fusion configurations for the expression tree in Figure 1. for i for j A[i,j]=genA(i,j) for j for k for l B[j,k,l]=genB(j,k,l) for k for l C[k,l]=genC(k,l) initialize f1 for i for j f1[j]+=a[i,j] for j for k for l f2[j,k,l]=b[j,k,l] C[k,l] initialize f3 for j for k for l f3[j,k]+=f2[j,k,l] for j for k f4[j,k]=f1[j] f3[j,k] initialize f5 for j for k f5[k]+=f4[j,k] (a) ½ Õ Õ Õ Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð (d) Ð Õ Õ Õ Õ Õ Õ Õ Ð initialize f1 for i for j A=genA(i,j) f1[j]+=a initialize f3 for k for l C=genC(k,l) for j B=genB(j,k,l) f2=b C f3[j,k]+=f2 initialize f5 for j for k f4=f1[j] f3[j,k] f5[k]+=f4 (b) ½ Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð Ð (e) Õ for k for l C[k,l]=genC(k,l) initialize f5 for j for i A[i]=genA(i,j) initialize f1 for i f1+=a[i] for k initialize f3 for l B=genB(j,k,l) f2=b C[k,l] f3+=f2 f4=f1 f3 f5[k]+=f4 (c) ½ Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð (f) 9 Õ Ð

10 closure over) parent-child relations. Fusing loops between unrelated nodes (such as fusing siblings without fusing their parent) has no effect on array sizes. We also restrict our attention for now to loop fusion configurations that do not increase the operation count. The tradeoff between memory usage and arithmetic operations is considered in Section 3.5. In the class of loops considered in this paper, the only dependence relations are those between children and parents, and array subscripts are simply loop index variables 1. So, loop permutations, loop nests reordering, and loop fusions are always legal as long as child nodes are evaluated before their parents. This freedom allows the loops to be permuted, reordered, and fused in a large number of ways that differ in memory usage. Finding a loop fusion configuration that uses the least memory is not trivial. We believe this problem is NP-complete but have not found a proof yet. Fusion graphs. Let Ì be an expression tree. For any given node Ú Ì, let subtree Úµ be the set of nodes in the subtree rooted at Ú, Úparent be the parent of Ú, and Úindices be the set of loop indices for Ú (including the summation index Úsumindex if Ú is a summation node). A loop fusion configuration can be represented by a graph called a fusion graph, which is constructed from Ì as follows. 1. Replace each node Ú in Ì by a set of vertices, one for each index Úindices. 2. Remove all tree edges in Ì for clarity. 3. For each loop fused (say, of index ) between a node and its parent, connect the -vertices in the two nodes with a fusion edge. 4. For each pair of vertices with matching index between a node and its parent, if they are not already connected with a fusion edge, connect them with a potential fusion edge. 1 When array subscripts are not simple loop variables, as many researchers have studied, more dependence relations exist, which prevent some loop rearrangement and/or loop fusions. In that case, a restricted set of loop fusion configurations would need to be searched in minimizing memory usage. 10

11 Figure 2 shows the fusion graphs alongside the loop fusion configurations. In the figure, solid lines are fusion edges and dotted lines are potential fusion edges, which are fusion opportunities not exploited. As an example, consider the loop fusion configuration in Figure 2(b) and the corresponding fusion graph in Figure 2(e). Since the -, -, and Ð-loops are fused between and, there are three fusion edges, one for each of the three loops, between and in the fusion graph. Also, since no loops are fused between and, the edges between and in the fusion graph remain potential fusion edges. In a fusion graph, we call each connected component of fusion edges (i.e., a maximal set of connected fusion edges) a fusion chain, which corresponds to a fused loop in the loop structure. The scope of a fusion chain, denoted scope µ, is defined as the set of nodes it spans. In Figure 2(f), there are three fusion chains, one for each of the -, -, and Ð-loops; the scope of the shortest fusion chain is. The scope of any two fusion chains in a fusion graph must either be disjoint or a subset/superset of each other. Scopes of fusion chains do not partially overlap because loops do not (i.e., loops must be either separate or nested). Therefore, any fusion graph with fusion chains whose scopes are partially-overlapping is illegal and does not correspond to any loop fusion configuration. Fusion graphs help us visualize the structure of the fused loops and find further fusion opportunities. If we can find a set of potential fusion edges that, when converted to fusion edges, does not lead to partially overlapping scopes of fusion chains, then we can perform the corresponding loop fusions and reduce the size of some arrays. For example, the -loops between and ½ in Figure 2(f) can be further fused and array would be reduced to a scalar. If converting all potential fusion edges in a fusion graph to fusion edges does not make the fusion graph illegal, then we can completely fuse all the loops and achieve optimal memory usage. But for many fusion graphs in real-life loop configurations (including the ones in Figure 2), this does not hold. Instead, potential fusion edges may be mutually prohibitive; fusing one loop could prevent the fusion of another. In 11

12 Figure 2(e), fusing the -loops between ½ and would disallow the fusion of the -loops between and. Although a fusion graph specifies what loops are fused, it does not fully determine the permutations of the loops and the ordering of the loop nests. As we will see in Section 3.4, under dynamic memory allocation, reordering loop nests could alter memory usage without changing array sizes. 3.2 Preliminaries So far, we have been describing the fusion between a node and its parent by the set of fused loops (or the loop indices such as ). But in order to compare loop fusion configurations for a subtree, it is desirable to include information about the relative scopes of the fused loops in the subtree. Scope and fusion scope of a loop. The scope of a loop of index in a subtree rooted at Ú, denoted scope Úµ, is defined in the usual sense as the set of nodes in the subtree that the fused loop spans. That is, if the -loop is fused, scope Úµ scope µ subtree Úµ, where is a fusion chain for the -loop with Ú scope µ. If the -loop of Ú is not fused, then scope Úµ. We also define the fusion scope of an -loop in a subtree rooted at Ú as fscope Úµ scope Úµ if the -loop is fused between Ú and its parent; otherwise fscope Úµ. As an example, for the fusion graph in Figure 2(e), scope µ, but fscope µ. Indexset sequence. To describe the relative scopes of a set of fused loops, we introduce the notion of an indexset sequence, which is defined as an ordered list of disjoint, non-empty sets of loop indices. For example, is an indexset sequence. For simplicity, we write each indexset in an indexset sequence as a string. Thus, is written as. Let and ¼ be indexset sequences. We denote by the number of indexsets in, Ö the Ö-th indexset in, and Set µ the union of all indexsets in, i.e., Set µ Ë ½Ö Ö. For instance,, ½, 12

13 and Set µ Set µ. We say that ¼ is a prefix of if ¼, ¼ ¼ ¼, and for all ½ Ö ¼, ¼ Ö Ö. We write this relation as prefix ¼ µ. So,,,,, are prefixes of, but is not. The concatenation of and an indexset Ü, denoted Ü, is defined as the indexset sequence ¼¼ such that if Ü, then ¼¼ ½, ¼¼ ¼¼ Ü, and for all ½ Ö ¼¼, ¼¼ Ö Ö ; otherwise, ¼¼. Fusion. We use the notion of an indexset sequence to define a fusion. Intuitively, the loops fused between a node and its parent are ranked by their fusion scopes in the subtree from largest to smallest; two loops with the same fusion scope have the same rank (i.e., are in the same indexset). For example, in Figure 2(f), the fusion between and is Ð and the fusion between and is (because the fused -loop covers two more nodes, and ½ ). Formally, a fusion between a node Ú and Úparent is an indexset sequence such that 1. Set µ Úindices Úparent.indices, 2. for all Set µ, the -loop is fused between Ú and Úparent, and 3. for all Ö and ¼ Ö ¼, (a) Ö Ö ¼ iff fscope Úµ fscope ¼ Úµ, and (b) Ö Ö ¼ iff fscope Úµ fscope ¼ Úµ. Nesting. Similarly, a nesting of the loops at a node Ú can be defined as an indexset sequence. Intuitively, the loops at a node are ranked by their scopes in the subtree; two loops have the same rank (i.e., are in the same indexset) if they have the same scope. For example, in Figure 2(e), the loop nesting at is Ð, at is, and at is Ð. Formally, a nesting of the loops at a node Ú is an indexset sequence such that 1. Set µ Úindices and 13

14 2. for all Ö and ¼ Ö ¼, (a) Ö Ö ¼ iff scope Úµ scope ¼ Úµ, and (b) Ö Ö ¼ iff scope Úµ scope ¼ Úµ. By definition, the loop nesting at a leaf node Ú must be Úindices because all loops at Ú have empty scope. Legal fusion. A legal fusion graph (corresponding to a loop fusion configuration) for an expression tree Ì can be built up in a bottom-up manner by extending and merging legal fusion graphs for the subtrees of Ì. For a given node Ú, the nesting at Ú summarizes the fusion graph for the subtree rooted at Ú and determines what fusions are allowed between Ú and its parent. A fusion is legal for a nesting at Ú if prefix µ and set µ Úparent.indices. This is because, to keep the fusion graph legal, loops with larger scopes must be fused before fusing those with smaller scopes, and only loops common to both Ú and its parent may be fused. For example, consider the fusion graph for the subtree rooted at in Figure 2(e). Since the nesting at is Ð and indices Ð, the legal fusions between and are,, Ð, Ð, and Ð. Notice that all legal fusions for a node Ú are prefixes of a maximal legal fusion, which can be expressed as MaxFusion Úµ ÑÜ prefix µ and set µ Úparent.indices, where is the nesting at Ú. In Figure 2(e), the maximal legal fusion for is Ð, and for is Ð. Resulting nesting. Let Ù be the parent of a node Ú. If Ú is the only child of Ù, then the loop nesting at Ù as a result of a fusion between Ù and Ú can be obtained by the function ExtNesting Ùµ Ùindices Set µµ. For example, in Figure 2(e), if the fusion between and is Ð, then the nesting at would be Ð. Compatible nestings. Suppose Ú has a sibling Ú ¼, is the fusion between Ù and Ú, and ¼ is the fusion between Ù and Ú ¼. For the fusion graph for the subtree rooted at Ù (which is merged from those of Ú and Ú ¼ ) to be legal, ExtNesting Ùµ and ¼ ExtNesting ¼ Ùµ must be compatible 14

15 according to the condition: for all Ö and, if Ö and ¼ Ö ¼ and ¼ ¼, then Ö ¼ ¼. This requirement ensures an -loop that has a larger scope than a -loop in one subtree will not have a smaller scope than the -loop in the other subtree. If and ¼ are compatible, the resulting loop nesting at Ù (as merged from and ¼ ) is ¼¼ such that for all ¼¼ Ö ¼¼ and ¼¼ ¼¼, if Ö, ¼ Ö ¼,, and ¼ ¼, then Ö ¼¼ ¼¼ µ Ö and Ö ¼ ¼ and Ö ¼¼ ¼¼ µ Ö and Ö ¼ ¼. Effectively, the loops at Ù are re-ranked by their combined scopes in the two subtrees to form ¼¼. As an example, in Figure 2(e), if the fusion between ½ and is and the fusion between and is ¼, then ExtNesting µ and ¼ ExtNesting ¼ µ would be incompatible. But if is changed to, then ExtNesting µ would be compatible with ¼, and the resulting nesting at would be. A procedure for checking if and ¼ are compatible and forming ¼¼ from and ¼ is provided in Section 3.3. The more-constraining relation on nestings. A nesting at a node Ú is said to be more or equally constraining than another nesting ¼ at the same node, denoted Ú ¼, if for all legal fusion graph for Ì in which the nesting at Ú is, there exists a legal fusion graph ¼ for Ì in which the nesting at Ú is ¼ such that the subgraphs of and ¼ induced by Ì subtree Úµ are identical. In other words, Ú ¼ means that any loop fusion configuration for the rest of the expression tree that works with also works with ¼. This relation allows us to do effective pruning among the large number of loop fusion configurations for a subtree in Section 3.3. It can be proved that the necessary and sufficient condition for Ú ¼ is that for all ÑÖ and Ñ, there exist Ö ¼ ¼ such that Ñ ¼ Ö ¼ and Ñ ¼ ¼ and Ö µ Ö ¼ ¼ and Ö µ Ö ¼ ¼, where Ñ MaxFusion Úµ and Ñ ¼ MaxFusion ¼ Úµ. Comparing the nesting at between Figure 2(e) and (f), the nesting Ð in (e) is more constraining than the nesting Ð in (f). A procedure for determining if Ú ¼ is given in Section

16 3.3 Static Memory Allocation Under the static memory allocation model, since all the arrays in a program exist during the entire computation, the memory usage of a loop fusion configuration is simply the sum of the sizes of all the arrays (including those reduced to scalars). Figures 3 and 4 shows a bottom-up, dynamic programming algorithm for finding a memory-optimal loop fusion configuration for a given expression tree Ì. For each node Ú in Ø, it computes a set of solutions Úsolns for the subtree rooted at Ú. Each solution in Úsolns represents a loop fusion configuration for the subtree rooted at Ú and contains the following information for : the loop nesting nesting at Ú, the fusion fusion between Ú and its parent, the memory usage mem so far, and the pointers src1 and src2 to the corresponding solutions for the children of Ú. The set of solutions Úsolns is obtained by the following steps. First, if Ú is a leaf node, initialize the solution set to contain a single solution using InitSolns. Otherwise, take the solution set from a child Úchild1 of Ú, and, if Ú has two children, merge it (using MergeSolns) with the compatible solutions from the other child Úchild2. Then, prune the solution set to remove the inferior solutions using PruneSolns. A solution is inferior to another unpruned solution ¼ if nesting is more or equally constraining than ¼ nesting and does not use less memory than ¼. Next, extend the solution set by considering all possible legal fusions between Ú and its parent (see ExtSolns). The size of array Ú is added to memory usage by AddMemUsage. Inferior solutions are also removed. Although the complexity of the algorithm is exponential in the number of index variables and the number of solutions could in theory grow exponentially with the size of the expression tree, the number of index variables in practical applications is usually small and there is indication that the pruning is effective in keeping the size of the solution set in each node small. The algorithm assumes the leaf nodes may be freely fused with their parents and the root node array must be available in its entirety at the end of the computation. If these assumptions do not 16

17 MinMemFusion Ì µ: InitFusible Ì µ foreach node Ú in some bottom-up traversal of Ì if Únchildren ¼ then ˽ InitSolns Úµ else ˽ Úchild1.solns if Únchildren then ˽ MergeSolns ˽ Úchild2.solnsµ ˽ PruneSolns ˽ Úµ Úsolns ExtendSolns ˽ Úµ Ìrootoptsoln the single element in Ìrootsolns foreach node Ú in some top-down traversal of Ì Úoptsoln Úoptfusion fusion ½ src1 if Únchildren ½ then Úchild1.optsoln ½ else Úchild1.optsoln ½src1 Úchild2.optsoln ½src2 InitFusible Ì µ: foreach Ú Ì if Ú Ìroot then Úfusible else Úfusible Úindices Úparent.indices InitSolns Úµ: nesting Úfusible InitMemUsage µ return MergeSolns ˽ ˵: Ë foreach ½ ˽ foreach Ë nesting MergeNesting ½nesting nestingµ if nesting then // if ½ and are compatible src1 ½ src2 MergeMemUsage ½ µ AddSoln ˵ return Ë PruneSolns ˽ Úµ: Ë foreach ½ ˽ nesting MaxFusion ½nesting Úµ AddSoln ˵ return Ë ExtendSolns ˽ Úµ: Ë foreach ½ ˽ foreach prefix of ½nesting fusion nesting ExtNesting Úparentµ 17

18 AddSoln ˵: foreach ¼ Ë if Inferior ¼ µ then return else if Inferior ¼ µ then Ë Ë ¼ Ë Ë MergeNesting ¼ µ: Ö Ö ¼ ½ Ü Ü ¼ while Ö or Ö ¼ ¼ if Ü then Ü Ö if Ü ¼ then Ü ¼ ¼ Ö ¼ Ý Ü Ü if Ý then return // and ¼ are incompatible Ý Ý Ü Ü ¼ Ü ¼ Ü ¼ Ü Ü Ý end while return // and ¼ are compatible Ú ¼ : // test if is more/equally constraining than ¼ Ö ¼ ½ Ü ¼ for Ö ½ to if Ü ¼ then if Ö ¼ ¼ then return false Ü ¼ ¼ Ö ¼ if Ö Ü ¼ then return false Ü ¼ Ü ¼ Ö return true InitMemUsage µ: mem ¼ AddMemUsage Ú size ½ µ: mem ½mem size MergeMemUsage ½ µ: mem ½mem mem Inferior ¼ µ nesting Ú ¼ nesting and mem ¼ mem FusedSize Ú µ É ( Úindices Úsumindex ) Æ ExtNesting Ùµ Ùindices Set µµ MaxFusion Úµ ÑÜ prefix µ and Set µ Úparent.indices 18 Set µ Ë ½Ö Ö

19 Ú line src nesting fusion ext-nest memory usage opt Ð Ð Ð 1 3 Ð Ð Ð 1 4 Ð Ð 15 5 Ð Ð Ð 40 6 Ð Ð 600 ½ = = ,3 2,4 Ð Ð Ð Ð Ð Ð (1+1)+1=3 (1+15)+1= ,5 Ð Ð Ð (1+40)+1= ,6 Ð Ð Ð (1+600)+1= Ð 17+1= Ð 602+1= ,14 8,13 (2+603)+1=606 (101+18)+1= ,14 ( )+1= =160 Figure 5: Solution sets for the subtrees in the example. hold, the InitFusible procedure can be easily modified to restrict or expand the allowable fusions for those nodes. To illustrate how the algorithm works, consider again the empty fusion graph in Figure 2(d) for the expression tree in Figure 1(c). Let Æ ¼¼, Æ ½¼¼, Æ ¼, and Æ Ð ½. There are different fusions between and. Among them, only the full fusion Ð is in solns because all other fusions result in more constraining nestings and use more memory than the full fusion and are pruned. However, this does not happen to the fusions between and since the resulting nesting Ð of the full fusion Ð is not less constraining than those of the other 3 possible fusions. Then, solutions from and are merged together to form solutions for. For example, when the two full-fusion solutions from and are merged, the merged nesting for is Ð, which can then be extended by full fusion (between and ) to form a full-fusion solution for that has a memory usage of only 3 scalars. Again, since this solution is not the least constraining one, other solutions cannot be pruned. Although this solution is optimal for the subtree rooted at, it turns out to be non-optimal for the entire expression tree. Figure 5 shows the 19

20 Õ ½ Õ Õ Õ Õ Õ Õ Õ Õ Ð Õ Õ Õ Õ Õ Õ Õ Õ Õ Ð Ð Õ initialize f1 for i for j A=genA(i,j) f1[j]+=a initialize f5 for k for l C[l]=genC(k,l) for j for l B=genB(j,k,l) f2=b C[l] initialize f3 f3+=f2 f4=f1[j] f3 f5[k]+=f4 (a) (b) Figure 6: An optimal solution for the example. solution sets for all of the nodes. The src column contains the line numbers of the corresponding solutions for the children. The ext-nest column shows the resulting nesting for the parent. A mark indicates the solution forms a part of an optimal solution for the entire expression tree. The fusion graph for the optimal solution is shown in Figure 6(a). Once an optimal solution is obtained, we can generate the corresponding fused loop structure from it. The following procedure determines an evaluation order of the nodes: 1. Initialize set È to contain a single node Ìroot and list Ä to an empty list. 2. While È is not empty, remove from È a node Ú where Úoptfusion is maximal among all nodes in È, insert Ú at the beginning of Ä, and add the children of Ú (if any) to È. 3. When È is empty, Ä contains the evaluation order. Putting the loops around the array evaluation statements is trivial. The initialization of an array can be placed inside the innermost loop that contains both the evaluation and use of the array. The optimal loop fusion configuration for the example expression tree is shown in Figure 6(b). 20

21 3.4 Dynamic Memory Allocation Under dynamic memory allocation, space for arrays can be allocated and deallocated as needed. We allow an array to be allocated/deallocated multiple times by putting the allocate/deallocate statements inside some loops. But each array must be allocated/deallocated in its entirety. We do not consider sharing of space between arrays. Given a loop fusion configuration, the positions of the allocate/deallocate statements are determined as follows. Let Ú be an array, be the statement that evaluates Ú, Ù be the statement that uses Ú, and the Ø-loop be the innermost loop that contains both and Ù. The latest point an allocate Ú statement can be placed is inside the Ø-loop and before and any loops that surround and are inside the Ø-loop. The earliest point a deallocate Ú statement can be placed is inside the Ø-loop and after Ù and any loops that surround Ù and are inside the Ø-loop. The memory usage of a loop fusion configuration is no longer the sum of the array sizes, but maximum total size of the array that are allocated at any time. Another difference between dynamic allocation and static allocation is that, with dynamic allocation, the evaluation order of the nodes having equal fusion with their parents can affect memory usage although the sizes of individual arrays remain unchanged. Consider the fusion graph shown in Figure 7(a). Since ½ and are equally fused with, either of the two subtrees rooted at ½ and can be evaluated first. Figure 7(b) and (c) show the loop fusion configurations for the two evaluation orders. Assume the same index ranges Æ ¼¼, Æ ½¼¼, Æ ¼, and Æ Ð ½ as before. In (b), the maximum memory usage is 4141 elements when ½,,, and are allocated during the evaluation of and. But in (c), when ½ is being evaluated,, ½, and coexist in memory and their total size is 4600 elements. Therefore, in addition to finding the fusions between children and parents, we need to determine the evaluation order of the nodes that minimizes memory usage. A simpler problem of finding the memory-optimal evaluation order of the nodes in an expression tree without any fusion has been addressed in [9]. Here, we apply its result to each set of nodes having equal fusion to obtain the 21

22 optimal evaluation order among them. We use the same bottom-up, dynamic programming algorithm as in Figures 3 and 4 to traverse the expression tree and enumerate the legal fusions for the nodes. But the procedures related to calculations of memory usage (namely, InitMemUsage, AddMemUsage, MergeMemUsage, and Inferior) are replaced by those in Figures 8 and 9. To correctly calculate memory usage and to determine the optimal evaluation order of nodes with equal fusion, instead of a mem field, each solution for a node Ú now has a seqset field, which is a set of fusion indexsets Set µ for the fusions in the subtree rooted at Ú. Each fusion indexset Ü in seqset is associated with an Üseq, which is a sequence of indivisible units. Each indivisible unit is a (nodelist, hi, lo) triple, where nodelist is an ordered list of nodes, hi is the maximum memory usage during the evaluation of the nodes in nodelist, and lo is the memory usage after those nodes are evaluated. The nodes in a nodelist has a special property that there necessarily exists some globally optimal traversal of the entire tree wherein this sequence appears undivided. Therefore, inserting any node in between the nodes of an indivisible unit does not lower the total memory usage. The seqsets and seqs are manipulated as follows. When two solutions from two children nodes of the same parent are merged together, their seqsets are merged by the MergeMemUsage procedure. A seq for a fusion indexset that appears in only one seqset is simply copied to the result seqset. If a fusion indexset Ü appears in both seqsets, the indivisible units in the two seqs for Ü are merge-sorted together in the order of decreasing hi-lo difference (by the MergeSeq procedure) to form a combined seq and their hi-lo values are adjusted. In [9], we have proved that arranging the indivisible units in this order minimizes memory usage. Moreover, the indivisible units in a seq must appear in the same order in some optimal traversal of the entire expression tree. When a solution is extended to include a new node Ú having fusion with its parent, the seqs in Úseqset for fusion indexsets larger than Set µ are collapsed into the seq for Set µ (by the 22

23 CollapseSeq procedure). In collapsing a seq É into another seq É ¼, all the indivisible units in É are appended to É ¼ after their hi-lo values are adjusted. When all collapsings are done, all fusion indexsets in Úseqset are subsets of Set µ. Then, a new indivisible unit for Ú is appended to the seq for Set µ. Whenever an indivisible unit is appended to a seq É by the AppendSeq procedure, it is combined with indivisible units at the end of É whose hi is not higher than hi or whose lo is not lower than lo. The combined indivisible unit has the concatenated nodelist and the highest hi but the original lo. An indivisible unit initially contains a single node. But as the algorithm goes up the tree, indivisible units may be combined to form ones that contain more nodes. As the procedures for enumerating legal fusions are the same for both static and dynamic memory allocation, we will use a particular fusion graph, the one in Figure 7(a), to illustrate how memory usage is maintained by the algorithm under dynamic allocation. For node, the fusion is and the fused array size is Æ ¼¼. So, seqset has a single fusion indexset whose associated seq is ¼¼ ¼¼µ. The seqsets for and are similarly obtained. Since ½ is not fused with, the AddMemUsage procedure collapses the only seq in seqset from fusion indexset to. When a new indivisible unit ½ ¼¼ ½¼¼µ for ½ is appended to the seq for, it is combined with the existing indivisible unit into ½ ¼¼ ½¼¼µ. Now, and ½ becomes indivisible. For node, the seqsets for and (which share no common fusion indexset) are first merged together by MergeMemUsage and then a new seq for Ð Set е is created for. Thus, seqset has 3 seqs. To form the seqset for, which is not fused with, the AddMemUsage procedure first collapses the 3 seqs in seqset into ½ ½µ, and then appends to it a new indivisible unit ¼ ½ ¼¼¼µ to form the single seq ¼ ½ ¼¼¼µ in seqset. The seqsets for all the nodes are shown in Figure 10. The single seq in the root node s seqset contains the optimal evaluation order of the nodes and the hi value of the first indivisible unit in that seq is the maximum memory usage (see Figure 7(b)). 23

24 3.5 Further Reduction in Memory Usage So far, we have been restricting our attention to reducing memory usage without any increase in the number of arithmetic operations. However, with this restriction, the optimal loop fusion configuration for some formula sequences may still require more than the available memory. As a result, it may be necessary to relax the restriction on operation count, for the implementation of some formula sequences to be feasible in terms of memory usage. Creating additional loops around some assignment statements may enable more loop fusions and thus reduce array sizes. To do this, we add additional vertices to the fusion graph and connect each new vertex of a node Ú and the corresponding vertex at the parent of Ú with an additional potential fusion edge. If the additional potential fusion edge is converted into a fusion edge in a fusion graph, the operation count for node Ú is multiplied by Æ, where is the loop index for the fusion edge. For example, in Figure 2(b), if we create an Ð-loop around and and fuse it with the Ð-loop around,,, and, we can eliminate both dimensions of and make it a scalar. However, this increases the operation counts for and by a factor of Æ Ð because they are re-compute Æ Ð times. Note that if we place no limit on the operation count, the memory usage minimization problem would have a trivial solution, which is to put all the assignment statements inside a perfect loop nest. Doing so would reduce all intermediate arrays to scalars but the operation count would return to its original unoptimized value. The existence of sparse arrays and common sub-expressions and the use of fast Fourier transforms are characteristics of many practical scientific computations. Their existence may prohibit the fusion of some loops, thereby imposing high lower bounds on the sizes of some intermediate arrays. With common-subexpressions, a directed acyclic graph (DAG) instead of an expression tree is needed to represent the formula sequence. Readers are referred to [12, 13] on how the optimization algorithms can be extended to handle these three characteristics and on the transformations to the fusion graph for further memory reduction at a cost of increased arithmetic operations. 24

25 The resulting formula sequence and the fusion graph are then fed into one of the memory usage minimization algorithms described in Sections 3.3 and 3.4 to determine the optimal loop fusion configuration. The memory usage minimization algorithms can be modified to maintain the additional arithmetic operations for each solution. A solution is inferior to another solution ¼ if, in addition to existing criteria, the additional operations for is greater than or equal to that for ¼. Any solution with a cumulative memory usage higher than a user-specified limit can be pruned. At the end, the algorithms return a unpruned solution for the root node that has the fewest additional arithmetic operations. 3.6 An Example We illustrate the practical application of the memory usage minimization algorithm on the following example formula sequence which can be used to determine self-energy in electronic structure of solids. It is optimal in operation count and has a cost of ½ ½¼ ½ operations. The ranges of the indices are Æ ½¼, Æ Ø ½¼¼, Æ ÊÄ Æ ÊĽ Æ ÊÄ Æ Æ ½ ½¼¼¼, and Æ Ö Æ Ö½ ½¼¼¼¼¼. f1[r,rl,rl1,t] = Y[r,RL] * G[RL1,RL,t] f2[r,rl1,t] = sum RL f1[r,rl,rl1,t] f5[r,rl2,r1,t] = Y[r,RL2] * f2[r1,rl2,t] f6[r,r1,t] = sum RL2 f5[r,rl2,r1,t] f7[k,r,r1] = exp[k,r] * exp[k,r1] f10[r,r1,t] = f6[r,r1,t] * f6[r1,r,t] f11[k,r,r1,t] = f7[k,r,r1] * f10[r,r1,t] f13[k,r1,t,g] = fft r f11[k,r,r1,t] * exp[g,r] f15[k,t,g,g1] = fft r1 f13[k,r1,t,g] * exp[g1,r1] 25

26 In this example, array is sparse and has only 10% non-zero elements. Notice that the common sub-expressions, exp, and appear at the right hand side of more than one formula. Also, ½ and ½ are fast Fourier transform formulae. Discussions on how to handle sparse arrays, common sub-expressions, and fast Fourier transforms can be found in [12, 13]. Without any loop fusion, the total size of the arrays is ½½ ½¼ ½ elements. If each array is to be computed only once, the presence of the common sub-expressions and FFTs would prevent the fusion of some loops, such as the Ö and Ö½ loops between and ½¼. Under the operation-count restriction, the optimal loop fusion configuration obtained by the memory usage minimization algorithm for static memory allocation requires memory storage for ½½¼ ½¼ ½½ array elements, which is 1000 times better than without any loop fusion. But this translates to about 1,000 gigabytes and probably still exceeds the amount of memory available in any computer today. Thus, relaxation of the operation-count restriction is necessary to further reduce to memory usage to reasonable values. We perform the following simple transformations to the DAG and the corresponding fusion graph. Two additional vertices are added: one for a -loop around ½¼ and the other for a Ø-loop around. These additional vertices are then connected to the corresponding vertices in ½½ with additional potential fusion edges to allow more loop fusion opportunities between ½½ and its two children. The common sub-expressions, exp, and are split into multiple nodes. Two copies of sub-dag rooted at are also made. This will overcome some constraints on legal fusion graphs for DAGs. The memory usage minimization algorithm for static memory allocation is then applied on the transformed fusion graph. The loop fusion configuration, the fusion graph, and the memory usage 26

27 and operation count statistics of the optimal solution found are shown in Figure 11. For clarity, the input arrays are not included in the fusion graph. The memory usage of the optimal solution after relaxing the operation-count restriction is significantly reduced by a factor of about 100 to ½½ ½¼ array elements. The operation count is increased by only around 10% to ½¼ ½¼ ½. Compared with the best hand-optimized loop fusion configuration, which also has some manuallyapplied transformations to reduce memory usage to ½½ ½¼ array elements and has ¼ ½¼ ½ operations, the optimal loop fusion configuration obtained by the algorithm shows a factor of 2.5 improvement in operation count while using the same amount of memory. 4 Communication Minimization on Parallel Computers with Memory Constraint Given a sequence of formulae, we now address the problem of finding the optimal partitioning of arrays and operations among the processors and the loop fusions on each processor in order to minimize inter-processor communication and computational costs while staying within the available memory in implementing the computation on a message-passing parallel computer. Section 4.1 briefly describes a multi-dimensional processor view and discusses the effects of loop fusions and array/operation partitioning on communication cost, computational cost, and memory usage. Two approaches for solving this problem are presented in Section Preliminaries A logical view of the processors as a multi-dimensional grid is used, where each array can be distributed or replicated along one or more of the processor dimensions. Let be the number of processors on the -th dimension of an Ò-dimensional processor array, so that the number of processors is ½ Ò. We use an Ò-tuple to denote the partitioning or distribution of 27

28 the elements of a data array on an Ò-dimensional processor array. The -th position in an Ò-tuple «, denoted «, corresponds to the -th processor dimension. Each position may be one of the following: an index variable distributed along that processor dimension, a * denoting replication of data along that processor dimension, or a 1 denoting that only the first processor along that processor dimension is assigned any data. If an index variable appears as an array subscript but not in the Ò-tuple, then the corresponding dimension of the array is not distributed. Conversely, if an index variable appears in the Ò-tuple but not in the array, then the data are replicated along the corresponding processor dimension, which is the same as replacing that index variable with a *. As an example, suppose 128 processors form a array. For the array Ð in Figure 1, the 4-tuple ½ specifies that the first and the second dimensions of are distributed along the first and second processor dimensions respectively (the third dimension of is not distributed), and that data are replicated along the third processor dimension and are assigned only to processors whose fourth processor dimension equals 1. Thus, a processor whose id is È Þ½ Þ Þ Þ will be assigned a portion of specified by myrange Þ ½ Æ ½ µ myrange Þ Æ µ ½ Æ Ð if Þ ½ and no part of otherwise, where myrange Þ Æ µ is the range Þ ½µ Æ ½ to Þ Æ. We assume the SPMD programming model and do not consider distributing the computation of different formulae on different processors. A child array (or a part of it) is redistributed before the evaluation of its parent if their distributions do not match. For instance, suppose the arrays Ð and Ð have distributions ½ and Ð ½ respectively. If we want to have distribution Ð ½ when evaluating Ð Ð Ð, would have to be redistributed from ½ to Ð ½ because the two distributions do not match. But since for Ð, the distribution Ð ½ is the same as Ð ½, is not redistributed. The partitioning of data arrays among the processors and the allowable fusions of loops on 28

29 each processor are inter-related. For the fusion of a Ø-loop between nodes Ù and Ú to be possible, that loop must either be undistributed at both Ù and Ú, or be distributed onto the same number of processors at Ù and at Ú (but not necessarily along the same processor dimension). Otherwise, the range of the Ø-loop at node Ù would be different from that at node Ú. For example, if the array Ð in Figure 1 has distribution ½ and fusion Ð with Ð, then can have distribution ½ but not Ð ½ because the fusion Ð forces the -dimension of to be distributed onto 2 processors and the Ð-dimension to be undistributed. Array partitioning and loop fusion also have effects on memory usage, communication cost, and computational cost. Fusing a Ø-loop between a node Ú and its parent eliminates the Ø-dimension of array Ú. If the Ø-loop is not fused but the Ø-dimension of array Ú is distributed along the -th processor dimension, then the range of the Ø-dimension of array Ú on each processor is reduced to Æ Ø. Let DistSize Ú «µ be the size on each processor of array Ú, which has fusion with its parent and distribution «. We have DistSize Ú «µ É Údimens DistRange Ú «Set µµ where Údimens Úindices Úsumindex is the array dimension indices of Ú before loop fusions, and ½ if Ü DistRange Ú «Üµ Æ if Ü and «Æ if Ü and «In our example, assume again that Æ ¼¼, Æ ½¼¼, Æ ¼, and Æ Ð ½. If the array Ð has distribution ½ and fusion Ð with, then the size of on each processor would be Æ ¼ since the -dimension is the only unfused dimension and is distributed onto 2 processors. Note that if array Ú undergoes redistribution from «to, the array size on each processor after redistribution is DistSize Ú µ, which could be different from DistSize Ú «µ, 29

30 the size before redistribution. The initial and final distributions of an array Ú determines the communication pattern and whether Ú needs redistribution, while loop fusions change the number of times array Ú is redistributed and the size of each message. Let Ú be an array that needs to be redistributed. If node Ú is not fused with its parent, array Ú is redistributed only once. Fusing a Ø-loop between node Ú and its parent puts the collective communication code for redistribution inside the Ø-loop. Thus, the number of redistributions is increased by a factor of Æ Ø if the Ø-dimension of Ú is distributed along the -th processor dimension and by a factor of Æ Ø if the Ø-dimension of Ú is not distributed. In other words, loop fusions cannot reduce communication cost. Continuing with our example, if the array Ð has fusion Ð with and needs to be redistributed from ½ to ½, then it would be redistributed Æ Æ Ð ¼ times. Let Mcost localsize «µ be the communication cost in moving the elements of an array, with localsize elements distributed on each processor, from an initial distribution «to a final distribution. We empirically measure Mcost for each possible non-matching pair of «and and for several different localsizes on the target parallel computer. Let MoveCost Ú «µ denote the communication cost in redistributing the elements of array Ú, which has fusion with its parent, from an initial distribution «to a final distribution. It can be expressed as: MoveCost Ú «µ MsgFactor Ú «Set µµ Mcost DistSize Ú «Set µµ «µ where MsgFactor Ú «É ܵ Údimens LoopRange Ú «Üµ and 30

Scan Scheduling Specification and Analysis

Scan Scheduling Specification and Analysis Scan Scheduling Specification and Analysis Bruno Dutertre System Design Laboratory SRI International Menlo Park, CA 94025 May 24, 2000 This work was partially funded by DARPA/AFRL under BAE System subcontract

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Designing Networks Incrementally

Designing Networks Incrementally Designing Networks Incrementally Adam Meyerson Kamesh Munagala Ý Serge Plotkin Þ Abstract We consider the problem of incrementally designing a network to route demand to a single sink on an underlying

More information

Search-based Model-driven Loop Optimizations for Tensor Contractions

Search-based Model-driven Loop Optimizations for Tensor Contractions Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2014 Search-based Model-driven Loop Optimizations for Tensor Contractions Ajay Panyala Louisiana State University

More information

Probabilistic analysis of algorithms: What s it good for?

Probabilistic analysis of algorithms: What s it good for? Probabilistic analysis of algorithms: What s it good for? Conrado Martínez Univ. Politècnica de Catalunya, Spain February 2008 The goal Given some algorithm taking inputs from some set Á, we would like

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

CSE 417 Branch & Bound (pt 4) Branch & Bound

CSE 417 Branch & Bound (pt 4) Branch & Bound CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity

More information

Response Time Analysis of Asynchronous Real-Time Systems

Response Time Analysis of Asynchronous Real-Time Systems Response Time Analysis of Asynchronous Real-Time Systems Guillem Bernat Real-Time Systems Research Group Department of Computer Science University of York York, YO10 5DD, UK Technical Report: YCS-2002-340

More information

UNIT 4 Branch and Bound

UNIT 4 Branch and Bound UNIT 4 Branch and Bound General method: Branch and Bound is another method to systematically search a solution space. Just like backtracking, we will use bounding functions to avoid generating subtrees

More information

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph.

Trees. 3. (Minimally Connected) G is connected and deleting any of its edges gives rise to a disconnected graph. Trees 1 Introduction Trees are very special kind of (undirected) graphs. Formally speaking, a tree is a connected graph that is acyclic. 1 This definition has some drawbacks: given a graph it is not trivial

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Notes on Binary Dumbbell Trees

Notes on Binary Dumbbell Trees Notes on Binary Dumbbell Trees Michiel Smid March 23, 2012 Abstract Dumbbell trees were introduced in [1]. A detailed description of non-binary dumbbell trees appears in Chapter 11 of [3]. These notes

More information

Reference Sheet for CO142.2 Discrete Mathematics II

Reference Sheet for CO142.2 Discrete Mathematics II Reference Sheet for CO14. Discrete Mathematics II Spring 017 1 Graphs Defintions 1. Graph: set of N nodes and A arcs such that each a A is associated with an unordered pair of nodes.. Simple graph: no

More information

A Modality for Recursion

A Modality for Recursion A Modality for Recursion (Technical Report) March 31, 2001 Hiroshi Nakano Ryukoku University, Japan nakano@mathryukokuacjp Abstract We propose a modal logic that enables us to handle self-referential formulae,

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

A Proposal for the Implementation of a Parallel Watershed Algorithm

A Proposal for the Implementation of a Parallel Watershed Algorithm A Proposal for the Implementation of a Parallel Watershed Algorithm A. Meijster and J.B.T.M. Roerdink University of Groningen, Institute for Mathematics and Computing Science P.O. Box 800, 9700 AV Groningen,

More information

Structure and Complexity in Planning with Unary Operators

Structure and Complexity in Planning with Unary Operators Structure and Complexity in Planning with Unary Operators Carmel Domshlak and Ronen I Brafman ½ Abstract In this paper we study the complexity of STRIPS planning when operators have a single effect In

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Graph and Digraph Glossary

Graph and Digraph Glossary 1 of 15 31.1.2004 14:45 Graph and Digraph Glossary A B C D E F G H I-J K L M N O P-Q R S T U V W-Z Acyclic Graph A graph is acyclic if it contains no cycles. Adjacency Matrix A 0-1 square matrix whose

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms CSE 101, Winter 018 D/Q Greed SP s DP LP, Flow B&B, Backtrack Metaheuristics P, NP Design and Analysis of Algorithms Lecture 8: Greed Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Optimization

More information

3 Competitive Dynamic BSTs (January 31 and February 2)

3 Competitive Dynamic BSTs (January 31 and February 2) 3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

A Comparison of Structural CSP Decomposition Methods

A Comparison of Structural CSP Decomposition Methods A Comparison of Structural CSP Decomposition Methods Georg Gottlob Institut für Informationssysteme, Technische Universität Wien, A-1040 Vienna, Austria. E-mail: gottlob@dbai.tuwien.ac.at Nicola Leone

More information

4.1 Interval Scheduling

4.1 Interval Scheduling 41 Interval Scheduling Interval Scheduling Interval scheduling Job j starts at s j and finishes at f j Two jobs compatible if they don't overlap Goal: find maximum subset of mutually compatible jobs a

More information

Greedy Algorithms CHAPTER 16

Greedy Algorithms CHAPTER 16 CHAPTER 16 Greedy Algorithms In dynamic programming, the optimal solution is described in a recursive manner, and then is computed ``bottom up''. Dynamic programming is a powerful technique, but it often

More information

Monotone Paths in Geometric Triangulations

Monotone Paths in Geometric Triangulations Monotone Paths in Geometric Triangulations Adrian Dumitrescu Ritankar Mandal Csaba D. Tóth November 19, 2017 Abstract (I) We prove that the (maximum) number of monotone paths in a geometric triangulation

More information

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct.

MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. MID TERM MEGA FILE SOLVED BY VU HELPER Which one of the following statement is NOT correct. In linked list the elements are necessarily to be contiguous In linked list the elements may locate at far positions

More information

A Framework for Space and Time Efficient Scheduling of Parallelism

A Framework for Space and Time Efficient Scheduling of Parallelism A Framework for Space and Time Efficient Scheduling of Parallelism Girija J. Narlikar Guy E. Blelloch December 996 CMU-CS-96-97 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523

More information

Kyle Gettig. Mentor Benjamin Iriarte Fourth Annual MIT PRIMES Conference May 17, 2014

Kyle Gettig. Mentor Benjamin Iriarte Fourth Annual MIT PRIMES Conference May 17, 2014 Linear Extensions of Directed Acyclic Graphs Kyle Gettig Mentor Benjamin Iriarte Fourth Annual MIT PRIMES Conference May 17, 2014 Motivation Individuals can be modeled as vertices of a graph, with edges

More information

SFU CMPT Lecture: Week 8

SFU CMPT Lecture: Week 8 SFU CMPT-307 2008-2 1 Lecture: Week 8 SFU CMPT-307 2008-2 Lecture: Week 8 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on June 24, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 8 Universal hashing

More information

SFU CMPT Lecture: Week 9

SFU CMPT Lecture: Week 9 SFU CMPT-307 2008-2 1 Lecture: Week 9 SFU CMPT-307 2008-2 Lecture: Week 9 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on July 8, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 9 Binary search trees

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network

Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network 1 Ulysses: A Robust, Low-Diameter, Low-Latency Peer-to-Peer Network Abhishek Kumar Shashidhar Merugu Jun (Jim) Xu Xingxing Yu College of Computing, School of Mathematics, Georgia Institute of Technology,

More information

An algorithm for Performance Analysis of Single-Source Acyclic graphs

An algorithm for Performance Analysis of Single-Source Acyclic graphs An algorithm for Performance Analysis of Single-Source Acyclic graphs Gabriele Mencagli September 26, 2011 In this document we face with the problem of exploiting the performance analysis of acyclic graphs

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Material You Need to Know

Material You Need to Know Review Quiz 2 Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing Decomposition Goals: Lossless Joins, Dependency preservation

More information

5. Lecture notes on matroid intersection

5. Lecture notes on matroid intersection Massachusetts Institute of Technology Handout 14 18.433: Combinatorial Optimization April 1st, 2009 Michel X. Goemans 5. Lecture notes on matroid intersection One nice feature about matroids is that a

More information

End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks

End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks End-to-end bandwidth guarantees through fair local spectrum share in wireless ad-hoc networks Saswati Sarkar and Leandros Tassiulas 1 Abstract Sharing the locally common spectrum among the links of the

More information

Computing optimal linear layouts of trees in linear time

Computing optimal linear layouts of trees in linear time Computing optimal linear layouts of trees in linear time Konstantin Skodinis University of Passau, 94030 Passau, Germany, e-mail: skodinis@fmi.uni-passau.de Abstract. We present a linear time algorithm

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Qualifying Exam in Programming Languages and Compilers

Qualifying Exam in Programming Languages and Compilers Qualifying Exam in Programming Languages and Compilers University of Wisconsin Fall 1991 Instructions This exam contains nine questions, divided into two parts. All students taking the exam should answer

More information

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1 UNIT I INTRODUCTION CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1. Define Graph. A graph G = (V, E) consists

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Algorithm classification

Algorithm classification Types of Algorithms Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We ll talk about a classification scheme for algorithms This classification scheme

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

IEEE LANGUAGE REFERENCE MANUAL Std P1076a /D3

IEEE LANGUAGE REFERENCE MANUAL Std P1076a /D3 LANGUAGE REFERENCE MANUAL Std P1076a-1999 2000/D3 Clause 10 Scope and visibility The rules defining the scope of declarations and the rules defining which identifiers are visible at various points in the

More information

arxiv: v2 [cs.ds] 18 May 2015

arxiv: v2 [cs.ds] 18 May 2015 Optimal Shuffle Code with Permutation Instructions Sebastian Buchwald, Manuel Mohr, and Ignaz Rutter Karlsruhe Institute of Technology {sebastian.buchwald, manuel.mohr, rutter}@kit.edu arxiv:1504.07073v2

More information

Graphs (MTAT , 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405

Graphs (MTAT , 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405 Graphs (MTAT.05.080, 4 AP / 6 ECTS) Lectures: Fri 12-14, hall 405 Exercises: Mon 14-16, hall 315 või N 12-14, aud. 405 homepage: http://www.ut.ee/~peeter_l/teaching/graafid08s (contains slides) For grade:

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Graph Algorithms Using Depth First Search

Graph Algorithms Using Depth First Search Graph Algorithms Using Depth First Search Analysis of Algorithms Week 8, Lecture 1 Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Graph Algorithms Using Depth

More information

Laboratory Module X B TREES

Laboratory Module X B TREES Purpose: Purpose 1... Purpose 2 Purpose 3. Laboratory Module X B TREES 1. Preparation Before Lab When working with large sets of data, it is often not possible or desirable to maintain the entire structure

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

MAT 7003 : Mathematical Foundations. (for Software Engineering) J Paul Gibson, A207.

MAT 7003 : Mathematical Foundations. (for Software Engineering) J Paul Gibson, A207. MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul Gibson, A207 paul.gibson@it-sudparis.eu http://www-public.it-sudparis.eu/~gibson/teaching/mat7003/ Graphs and Trees http://www-public.it-sudparis.eu/~gibson/teaching/mat7003/l2-graphsandtrees.pdf

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Main Goal of Algorithm Design Design fast

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs

Graphs and Network Flows ISE 411. Lecture 7. Dr. Ted Ralphs Graphs and Network Flows ISE 411 Lecture 7 Dr. Ted Ralphs ISE 411 Lecture 7 1 References for Today s Lecture Required reading Chapter 20 References AMO Chapter 13 CLRS Chapter 23 ISE 411 Lecture 7 2 Minimum

More information

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm

managing an evolving set of connected components implementing a Union-Find data structure implementing Kruskal s algorithm Spanning Trees 1 Spanning Trees the minimum spanning tree problem three greedy algorithms analysis of the algorithms 2 The Union-Find Data Structure managing an evolving set of connected components implementing

More information

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees.

Throughout the chapter, we will assume that the reader is familiar with the basics of phylogenetic trees. Chapter 7 SUPERTREE ALGORITHMS FOR NESTED TAXA Philip Daniel and Charles Semple Abstract: Keywords: Most supertree algorithms combine collections of rooted phylogenetic trees with overlapping leaf sets

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Chapter 11: Query Optimization

Chapter 11: Query Optimization Chapter 11: Query Optimization Chapter 11: Query Optimization Introduction Transformation of Relational Expressions Statistical Information for Cost Estimation Cost-based optimization Dynamic Programming

More information

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s

Graph Traversal. 1 Breadth First Search. Correctness. find all nodes reachable from some source node s 1 Graph Traversal 1 Breadth First Search visit all nodes and edges in a graph systematically gathering global information find all nodes reachable from some source node s Prove this by giving a minimum

More information

Kernelization Upper Bounds for Parameterized Graph Coloring Problems

Kernelization Upper Bounds for Parameterized Graph Coloring Problems Kernelization Upper Bounds for Parameterized Graph Coloring Problems Pim de Weijer Master Thesis: ICA-3137910 Supervisor: Hans L. Bodlaender Computing Science, Utrecht University 1 Abstract This thesis

More information

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 7. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 7 Dr. Ted Ralphs ISE 418 Lecture 7 1 Reading for This Lecture Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Wolsey Chapter 7 CCZ Chapter 1 Constraint

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

Solution for Homework set 3

Solution for Homework set 3 TTIC 300 and CMSC 37000 Algorithms Winter 07 Solution for Homework set 3 Question (0 points) We are given a directed graph G = (V, E), with two special vertices s and t, and non-negative integral capacities

More information

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems

Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept

More information

SDD Advanced-User Manual Version 1.1

SDD Advanced-User Manual Version 1.1 SDD Advanced-User Manual Version 1.1 Arthur Choi and Adnan Darwiche Automated Reasoning Group Computer Science Department University of California, Los Angeles Email: sdd@cs.ucla.edu Download: http://reasoning.cs.ucla.edu/sdd

More information

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems J.M. López, M. García, J.L. Díaz, D.F. García University of Oviedo Department of Computer Science Campus de Viesques,

More information

Reachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace

Reachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace CHICAGO JOURNAL OF THEORETICAL COMPUTER SCIENCE 2014, Article 2, pages 1 29 http://cjtcs.cs.uchicago.edu/ Reachability in K 3,3 -free and K 5 -free Graphs is in Unambiguous Logspace Thomas Thierauf Fabian

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to

More information

15.4 Longest common subsequence

15.4 Longest common subsequence 15.4 Longest common subsequence Biological applications often need to compare the DNA of two (or more) different organisms A strand of DNA consists of a string of molecules called bases, where the possible

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining

More information

Database Languages and their Compilers

Database Languages and their Compilers Database Languages and their Compilers Prof. Dr. Torsten Grust Database Systems Research Group U Tübingen Winter 2010 2010 T. Grust Database Languages and their Compilers 4 Query Normalization Finally,

More information

looking ahead to see the optimum

looking ahead to see the optimum ! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,

More information

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search

Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Simpler, Linear-time Transitive Orientation via Lexicographic Breadth-First Search Marc Tedder University of Toronto arxiv:1503.02773v1 [cs.ds] 10 Mar 2015 Abstract Comparability graphs are the undirected

More information

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long

More information

Binary Trees

Binary Trees Binary Trees 4-7-2005 Opening Discussion What did we talk about last class? Do you have any code to show? Do you have any questions about the assignment? What is a Tree? You are all familiar with what

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18

1 Format. 2 Topics Covered. 2.1 Minimal Spanning Trees. 2.2 Union Find. 2.3 Greedy. CS 124 Quiz 2 Review 3/25/18 CS 124 Quiz 2 Review 3/25/18 1 Format You will have 83 minutes to complete the exam. The exam may have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems,

More information

4 Fractional Dimension of Posets from Trees

4 Fractional Dimension of Posets from Trees 57 4 Fractional Dimension of Posets from Trees In this last chapter, we switch gears a little bit, and fractionalize the dimension of posets We start with a few simple definitions to develop the language

More information

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È.

RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. RSA (Rivest Shamir Adleman) public key cryptosystem: Key generation: Pick two large prime Ô Õ ¾ numbers È. Let Ò Ô Õ. Pick ¾ ½ ³ Òµ ½ so, that ³ Òµµ ½. Let ½ ÑÓ ³ Òµµ. Public key: Ò µ. Secret key Ò µ.

More information

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο

( ) D. Θ ( ) ( ) Ο f ( n) ( ) Ω. C. T n C. Θ. B. n logn Ο CSE 0 Name Test Fall 0 Multiple Choice. Write your answer to the LEFT of each problem. points each. The expected time for insertion sort for n keys is in which set? (All n! input permutations are equally

More information

Discrete mathematics

Discrete mathematics Discrete mathematics Petr Kovář petr.kovar@vsb.cz VŠB Technical University of Ostrava DiM 470-2301/02, Winter term 2018/2019 About this file This file is meant to be a guideline for the lecturer. Many

More information

On the Performance of Greedy Algorithms in Packet Buffering

On the Performance of Greedy Algorithms in Packet Buffering On the Performance of Greedy Algorithms in Packet Buffering Susanne Albers Ý Markus Schmidt Þ Abstract We study a basic buffer management problem that arises in network switches. Consider input ports,

More information

Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network

Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network Distributed Data Structures and Algorithms for Disjoint Sets in Computing Connected Components of Huge Network Wing Ning Li, CSCE Dept. University of Arkansas, Fayetteville, AR 72701 wingning@uark.edu

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

CSE 421 Applications of DFS(?) Topological sort

CSE 421 Applications of DFS(?) Topological sort CSE 421 Applications of DFS(?) Topological sort Yin Tat Lee 1 Precedence Constraints In a directed graph, an edge (i, j) means task i must occur before task j. Applications Course prerequisite: course

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

2. Lecture notes on non-bipartite matching

2. Lecture notes on non-bipartite matching Massachusetts Institute of Technology 18.433: Combinatorial Optimization Michel X. Goemans February 15th, 013. Lecture notes on non-bipartite matching Given a graph G = (V, E), we are interested in finding

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information