On Partitioning FEM Graphs using Diffusion

Size: px
Start display at page:

Download "On Partitioning FEM Graphs using Diffusion"

Transcription

1 On Partitioning FEM Graphs using Diffusion Stefan Schamberger Universität Paderborn, Fakultät für Elektrotechnik, Informatik und Mathematik Fürstenallee 11, D Paderborn Abstract To solve the graph partitioning problem, efficient heuristics have been developed also capable of distributing the computational load in parallel FEM computations. However, although a few parallel implementations do exist, the involved algorithms are hard to parallelize due to their sequential nature. This paper presents a new approach to deal with the FEM graph partitioning problem. Applying diffusion as growing mechanism, we are able to eliminate restrictions of former implementations based on the bubble framework and construct a relatively simple algorithm with a high degree of natural parallelism. We demonstrate that it computes solutions comparable to those of established heuristics. Its drawback is the long execution time if the parallelism is not exploited. Keywords: FEM graph partitioning, load balancing, diffusion, first-order-scheme, bubble 1 Introduction Graph partitioning is an important subproblem in many applications. One of them consists in balancing the computational load in distributed (adaptive) Finite Element Method (FEM) computations. Since these computations usually follow the single-program multiple-data paradigm, the same code is executed on all processors but on different parts of the data. This implies that the mesh discretizing the continuous simulation space has to be partitioned into P subdomains each assigned to one of the P processors involved in the computation. The applied iterative solvers mainly perform local operations defined by adjacencies in the mesh, hence the parallel algorithms mostly require communication at the partition boundaries. Thus, the parallel efficiency depends on two factors: An equal distribution of load on the processors and a small communication overhead archived by minimizing the number of messages to be exchanged between the different parts of the mesh. This work was partly supported by the German Science Foundation (DFG) project SFB-376 and by the IST Program of the EU under contract numbers IST (ALCOM-FT) and IST (FLAGS). The communication pattern of FEM computations can be modeled by a graph where the vertices represent the data and the edges the dependencies. It is known that the graph partitioning problem is NP-complete, but fortunately a number of efficient heuristics have been developed finding good solutions. State-of-the-art sequential libraries like Metis [1], Jostle [2] and Party [3] base on the multilevel paradigm introduced in [4], combined with a matching strategy to coarsen the graph and a local refinement, often a Kerninghan-Lin like algorithm [5], that is applied in every level during the uncoarsening. By exchanging vertices or vertex sets between partitions, they try to further reduce the edge cut (or any alternative metric) and therefore are essential to find a good solution. However, due to the fine granularity of these steps, their interdependencies and the involvement of more than one partition, these procedures are hard to parallelize. Different approaches have been proposed to maintain the integrity of the distributed data structure and ensure the efficiency of the local heuristics. For example, the parallel version of Metis [6] disallows adjacent vertices to be active at the same time by computing a graph coloring. In the same round, only vertices of one color may be transfered to neighboring partitions. More advanced methods are implemented in Jostle [2]. This also a good reference to problems usually encountered parallelizing the sequential improvement steps and to different ways to deal with them. Another overview on distributed graph partitioning algorithms can be found in [7]. The global edge cut is the classical metric most graph partitioners optimize. In case of FEM computations, this is not necessarily the best metric to follow because it does not model the real communication and runtime costs as described in [8]. Hence, different metrics have been implemented inside the local refinement process modeling the real objectives more closely. In [9], the costs emerging from vertex transfers is taken into consideration while Metis [1] is capable of minimize the subdomain connectivity as well as the communication costs. A completely different approach is undertaken in [10]. Since the convergence rate of the domain decompo- 1

2 Figure 1. The three phases of a bubble algorithm: Determination of initial seeds for each partition (left). Growing in a breadth-first-manner around the seeds (middle). Movement of the seeds to the partition centers (right). This is repeated until a stable state is reached. sition solver in the PadFEM environment depends on the geometric shape of a partition, the integrated load balancer focuses on iteratively optimizing their aspect ratio by applying a bubble like algorithm. Although different to the multilevelschemes, this approach still contains a strictly serial part and suffers from some other difficulties we describe in more detail in section 2. The new approach proposed in this paper is based on the bubble framework, too, but we replace the most important operations by diffusive ones. Although we have only implemented a sequential version of the algorithm yet, we are convinced that its parallelization is not too complicated since it does not contain any strictly sequential parts. We demonstrate that the proposed, diffusion based method is applicable and that the delivered solutions are comparable with those of state-of-the-art partitioning libraries. The remaining part of this paper is organized as follows: In the next section we describe the bubble framework in more detail and also discuss its existing implementations. In section 3 we briefly introduce the first order diffusion scheme which we integrate as growing mechanism. The resulting algorithm is described in section 4. Section 5 shows how we perform our experiments and presents comparisons between results of the new algorithm and those of other partitioning libraries. The last section contains a conclusion, a discussion about further work and a number of open questions. 2 The Bubble Framework The bubble framework (also described in [10]) has evolved from simple greedy algorithms computing bisections of graphs. Starting with an initial, often randomly chosen vertex (seed) per partition, all subdomains are grown simultaneously in a breadth-first manner. Colliding parts form a common border and keep on growing along this border just like soap bubbles. After the whole mesh has been covered and all vertices of the graph have been assigned this way, each component computes its new center that acts as the seed in the next iteration. This is usually repeated until a stable state, where the movement of all seeds is small enough, is reached. This procedure is based on the observation that within perfect bubbles, the center and the seed vertex coincide. Distances in this framework may either be chosen as Euclidian distances or as path lengths in a graph. In the latter case, no geometrical information is required. Summarized, a bubble algorithm mainly consists of the following three phases that are also illustrated in figure 1: Init A vertex for each partition is determined. These vertices act as the seeds in the first iteration. Grow Starting from their seeds, all partitions grow in a breadth-first manner until all vertices are assigned. Move All partitions determine their center vertices that are the seeds in the next iteration. To our knowledge and apart from simple greedy heuristics there are two implementations that apply the bubble framework to solve the FEM graph partitioning problem. The first one is part of a former version of the Party graph partitioning library. There, the implementation of the three phases can roughly be described as follows: Init The initial seeds are determined randomly. Grow Starting from every seed a breadth-first algorithm on the graph is applied. During this process the partitions alternately acquire one of their free neighbor vertices until all vertices are assigned. Move The vertex with the minimal maximal distance to all other vertices of the same partition becomes the new seed. 2

3 This approach shows several problems. The initial placement of the partitions may be very bad requiring many iterations until it is fixed, but even then the partition sizes usually vary extremely. The time spend on finding new seeds is quite long since a breadth-first-search has to be solved for every vertex. Moreover, the partition quality is not considered at all. Another important disadvantage is that the growing phase cannot be parallelized because vertices are assigned serially and earlier assignments have a great impact on later decisions. A second approach is described in [10] and, as already mentioned, it has been implemented in a former version of the FEM simulation tool PadFEM: Init The first initial seed is randomly chosen among the vertices with smallest degree. Then, to determine the seed for the next partition, a breadth-search is performed with all seeds as starting points. The last vertex found becomes the seed for the next partition. This is repeated until all seeds have been determined. Grow The smallest partition with at least one adjacent unassigned vertex grabs the vertex with the smallest Euclidian distance to its seed. Move The new seed of a partition becomes the vertex for that the sum of Euclidian distances to all other vertices of the same partition is minimal. To find this vertex quickly, some successive approximation is used. This algorithm solves some of the problems we have seen in the first approach. The initial seeds are distributed more evenly over the graph. Since the smallest possible partition gathers the next vertex, more attention is paid on the balance and also the determination of the center has been improved to work faster. By including coordinates in the choice of the next vertex, the partitions are usually also geometrically well shaped (and connected), what is the main goal of this approach. Other quality metrics are not considered. By relying on vertex coordinates this approach is only applicable if these are provided, and sometimes the Euclidian distance does not coincide at all with the path length between vertices. This can often be observed if an FEM mesh contains holes, in which case a partition may be placed around it. It is a general problem when working with coordinates and occurs more heavily for example in space-filling-curve based partitionings [11]. The experiments made in [10] also reveal that the selection mechanism, though improved by preferring under-weighted partitions, does still not lead to sufficient well balanced domains. Hence, to fix this, some additional computations are added after the last bubble iteration. Concerning a possible parallelization, the situation stays the same as described before because the selection process of the vertices is still strictly serial. 3 Diffusion In this section we briefly describe diffusive schemes. We assume that these are well known in the graph partitioning community since they are often applied as load balancers. The most simple one, first introduced in [12], is the first order scheme (FOS). Let G = (V,E) be a connected, undirected graph and l v R be the load of node v V. With l := ( l v / V ) V we denote the vector of the balanced load. Now, the task of a load balancing algorithm is to compute a flow f R E, such that A f = l l with A { 1,0,+1} R V E defined as the node edge incidence matrix of G. FOS performs on each node v i V the iteration [13]: e = (v i,v j ) E : and x k 1 e = α(li k 1 l k 1 j ) fe k = fe k 1 + xe k 1 li k = l k 1 i e=(v i,v j ) E x k 1 e where xe k is the amount of load exchanged via edge e in iteration k and α is a properly chosen parameter, e.g. α = (1 + maxdeg(v )) 1. (Note, that better α can be computed as shown for example in [14]). In matrix notation, this can be written as l k = Ml k 1 with the diffusion matrix M = I αl R V V. Here, L = AA T is the Laplacian matrix of G. For the error ε k = l k l holds ε k 2 γ k ε 0 2, where γ = max m i=2 µ i is the absolute second largest of the m distinct eigenvalues 1 = µ 1 > > µ m > 1 of M. Moreover, it has been shown that the calculated f k (like in all diffusion schemes) converge toward the l 2 -minimal balancing flow [15]. FOS has some interesting properties. First, in every iteration communication only occurs between adjacent vertices. Thus, no global view of the graph G is required and all calculations in one iteration of the diffusion scheme can be performed in parallel on all edges and vertices, respectively. The second observation we make is that load tends to spread faster in regions of the graph where more distinct paths between two nodes exist. We call these regions densely connected. Figure 2 gives an example. The biplane9 graph shown is a square-based mesh and some parts of the mesh, mainly around the two air-wings, have been refined more often than others. Hence, about half of the vertices located at the transition between these regions contain more edges toward the finer part. The load distribution in this example is originated at a single vertex close to such a transition. One can see that on the one hand the amount of load received by a vertex depends on its distance (path length) to this source. The reason for this is obvious when looking at the iteration scheme. On the other hand, vertices in the same distance to the seed but placed before a transition receive more load than those behind it. A similar behavior can be observed in refined graphs that do not 3

4 nected region. Summarized, the three phases of the bubble framework can be described as follows: Init The initial seeds are determined randomly although other methods could be implemented easily. For each partition, V /P load of its color is placed on its seed vertex. All other vertices stay empty. Grow We use FOS as growing mechanism. The load distribution is computed independently for all partitions (colors) and is stopped after k iterations, far before all vertices contain an equal amount of load. Move Contraction The vertex containing the most load of color p becomes the new seed of a partition p. At the beginning of the next iteration, only the seed vertices contain V /P load, all others stay empty. Move Consolidation All vertices are assigned to the partition they have received the highest amount of load from. To prepare for the next iteration, each partition distributes V /P load evenly among all its vertices. Figure 2. Load distribution originated at a single seed after 50 FOS iterations. Vertices with high load are colored red while empty vertices are black. The Init phase is identical to the former version of the Party library. As mentioned, a random distribution may place vertices suboptimal, but on the other hand it is more likely that vertices in dense regions are chosen. To grow partitions, we use FOS. Note, that in contrast to the described implementations, all partitions operate independently. The decision about the vertex assignments is delayed and integrated into the movement process. Hence, this is the most most interesting point, also because two different methods are applied. The first operation, called contraction, comes close to the former movement implementations described in section 2. A single vertex in the center of each partition containing the maximal load of the according color is determined and it becomes the new seed for the next iteration. However, since only a few FOS iterations are performed, this will be very likely the same vertex that initiated the diffusion process in the current iteration. Hence, no movement would occur. To fix this, we introduced a second operation we call consolidation. In contrast to the contraction, not a single vertex is assigned as new seed but the whole partition is used. Since more vertices are within the dense regions of the graph, this operation will direct the partition toward its desired position and also insures that in the following loops a different vertex contains the maximum load, as long as the final state has not been reached. Furthermore, the consolidation step helps to fix numerical problems that otherwise could occur if the load values became to small after executing too many FOS iterations. The resulting diffusive bubble algorithm (DB) looks like as sketched in figure 3. The input consists of the graph G that is also capable of storing load and flow vectors, and several contain the described T-intersections. This leads to the idea of this paper which we present in the next section. 4 DB - A Diffusive Bubble Algorithm In this section we describe how we integrate diffusion into the bubble framework. The main idea is based on the observation described in the last section: Load primarily diffuses into densely connected regions of the graph rather than into sparsely connected ones. Following this observation we expect to identify sets of vertices that possess a high number of internal and a small number of external edges. This procedure shows many similarities to graph clustering algorithms, with the difference, that we fix the number of clusters to P by executing the diffusion algorithm exactly P times, each time with a different kind of load. To distinguish between these loads we color them with colors from 1 to P, respectively. After all load is distributed, we assign the vertices to that partition they obtained load from. If a vertex contains more than one kind of load, meaning that it could be part of more than one cluster, it is assigned to the partition of which it received the highest amount from. Hence, a partition can crowd others out of parts of the graph if it itself already contains a higher load nearby. These dynamical movements are addressed by the bubble framework. During its iterations the partition centers, from which the diffusion process is initiated, are rearranged such that they are finally well distributed over the graph and are preferably placed within a densely con4

5 00 Algorithm DB(G, i, j, k) 01 in each iteration i 02 if i = 1 03 determine-seeds(g) 04 else 05 parallel for each partition p 06 distribute-load(g, p) 07 FOS(G, k) 08 contraction(g) 09 in each loop j 10 parallel for each partition p 11 distribute-load(g, p) 12 FOS(G, k) 13 π = consolidation(g) 14 return π Figure 3. Sketch of the DB algorithm. parameters i, j and k all specifying the number of the different iterations to be performed. The outer iteration (line 01), executed at least once, always starts determining single seed vertices for each of the partitions. Only in the first iteration random vertices are chosen (line 03). In the following iterations the seeds are determined by a contraction step (lines 05-08), for each partition p independently consisting of a equal load distribution (that follows a consolidation step) (line 06), and k FOS iterations (line 07). The determination of the vertex that contains the highest amount of load is performed in the contraction (line 08). It is followed by j loops (line 09) containing a consolidation step. This involves again an equal distribution of load on all vertices of a particular partition (line 11) and the FOS iterations (line 12). Note, that during the first iteration following a contraction all partitions consist only of a single vertex, the seed. The last operation, the consolidation itself (line 13), determines the maximal load of each color on every vertex and therefore also the resulting partitioning π. The DB algorithm mainly contains a collection of loops. Of those, the loops over the partitions (lines 05 and 10) and the FOS iterations are independent and therefore can be fully parallelized. Also the consolidation is a per vertex operation, only the maximum computations during the contraction requires a more global view on the whole partition. Another interesting point is that DB does not contain any explicit objectives. These are hidden inside the growth and movement operations. As with all other bubble implementations, some difficulties arise establishing the balance of the partitionings. However, adjusting the total load of a domain (either placed on a single vertex or equally distributed on all of them) provides a handy method to address this problem. Instead of supplying all partitions with the same load amount W = V /P, we increase the load on under-weighted and decrease it on overweighted partitions. To disable flipping, a damping-factor is included. The total load amount L j p placed on partition p in loop j is calculated as ( δp j 1 = W W j 1 p ) 2, θ j p = (1 ) θp j 1 L j p = θ j p W Wp j 1, δp j 1 + (δ j 1 p ) 2 where Wp j 1 denotes p s weight (size) after the last consolidation and is set to Wp 0 = W during a contraction. This procedure grants us some control over the partitions sizes. However, our experiments show that in some cases this does not suffice. Furthermore, the load balancing process sometimes is very slow and takes many iterations to show effect, especially if the under-weighted and overweighted partitions are situated far from each other. Thus, more advanced methods are needed as discussed in section 6. Figure 4 shows an example of the load distribution after 4 iterations (25 loops, 25 FOS iterations) using the biplane9 graph. Note, that only the dominant amount of load on a vertex is displayed. The partitions in the lower left part of the graph are smaller than their counterparts on the right, what can be easily recognized because they have been assigned a higher total load. With 1409 vertices, the weight of the heaviest partition is about 4% to high what might be noticed in the resulting partitioning displayed in figure 5. 5 First Experimental Results In this section we present some results obtained with a sequential implementation of the proposed DB algorithm and compare them to solutions of the (sequential) partitioning libraries kmetis, Jostle and Party. All of the heuristics are invoked with their default parameters, in case of the DB algorithm this means 4 iterations with 25 loops/fos iterations and a damping factor of = 1/4. Furthermore, we replace the global α introduced for the FOS iterations in section 3 by a separate one for every edge e = (v i,v j ) and set it to α e = (1 + max(deg(v i ),deg(v j )) 1. Our experiments are based on the set of graphs shown in table 1. Some of these graphs have been frequently used to compare graph partitioning heuristics while we included some others from the padfem 2 simulation tool. Note, that all graphs are FEM graphs (or their duals) of a relatively small size. The reasons for this limitation are discussed in section 6. To judge the quality of a partitioning, several metrics are possible. The classical one is the edge cut, that is the number of edges between vertices of different partitions. Since it 5

6 Figure 4. The maximal load per vertex after 4 iterations. Figure 5. The resulting partitioning after 4 iterations. is known that this metric does not model the real communication costs of FEM applications, we also measure the number of external edges, the number of boundary vertices and the communication volume (send and receive), assuming that vertices represent information and edges the communication pattern. For each partition p these metrics can be described as follows: The l1 -norm (summation norm) is a global norm. The global edge cut belongs into this category (it equals half the external edges in this norm). In contrast to the l1 -norm, the l -norm (maximum norm) is a local norm only considering the worst value. This norm is favorable if synchronized processes are involved. The l2 -norm (Euclidian norm) lays in between the l1 and the l -norm and reflects the global situation as well as local peaks. It has been shown that comparisons based on a single test per graph usually do not lead to meaningful conclusions. Therefore, we apply the permutation based evaluation scheme from [3]. We perform 100 runs for the multilevel heuristics but reduce this number to 10 for the DB algorithm due to its longer run-time. The results are summarized in a collection of charts generated by a script. Due to space limitations, we restrict our presentation to 16-partitionings and include only one chart collection. We also omit the timings because our current sequential implementation of the DB algorithms is not optimized. Thus, its run-time is much higher than those of the multilevel-heuristics. While on modern computers it takes the latter only a fraction of a second to compute a solution, the DB algorithm needs about a minute to compute a 16 partitioning of the biplane9 graph. Possible enhancements to DB are discussed in section 6. Figure 6 gives the detailed results we obtained dividing the biplane9 graph into 16 domains. Each of the chart contains values for the different partitioning libraries and displays the average of 100 (10) runs as well as the standard deviation and the extremes, respectively. external edges Number of edges that are incident to exactly one vertex of partition p. boundary vertices Number of vertices of partition p that are adjacent to at least one vertex from a different partition. send volume The amount of outgoing information is the sum of the adjacent partitions different to p that each vertex residing inside partition p has. receive volume The amount of incoming information is the number of vertices of partitions different to p adjacent to at least one vertex of partition p. Note that Metis is also capable of minimizing the communication volume [1]. However, we did not make use of this option. Furthermore, for each metric we consider three different norms. Given the values x1,..., xp, the norms are defined as follows: l1 : X = x1 + + xp q l2 : X = x xp2 l : X = max ( x1,..., xp ) 6

7 Table 1. Graphs used in this paper and some of their properties. graph V E min. deg. av. deg. max. deg. diameter origin grid FEM 2D airfoil FEM 2D airfoil1 dual FEM 2D dual biplane FEM 2D crack FEM 2D crack dual FEM 2D dual cs FEM 3D padfem dual FEM 3D dual sphere FEM 3D The chart cut compares the classical global edge cut. Note, that this is equivalent to half the number of external edges measured in l 1 -norm. One can see that the DB algorithm finds solutions comparable to those of kmetis and Jostle without imbalance allowance. However, much better results exist as for example demonstrated by Party. Looking at the balance chart, this becomes even more visible because the solutions computed by the DB algorithm are less balanced than those of the other libraries. As already mentioned, the results concerning the number external edges match with those for the global edge-cut for the l 1 -norm. Regarding the l 2 and l -norm, the situation is similar with a decreasing deviant between the libraries. A different picture is drawn by the metrics that model the real communication costs more closely. Concerning both, the boundary vertices and the communication volume, the solutions computed by the DB algorithm are much better and result in shorter boundaries and less messages than those of any other library. This advantage is especially noticeable in the l 1 and l 2 -norm, but also exists in the l -norm. This means smaller communication costs and a better distribution among the partitions. Remembering the edge-cuts, it becomes clear that the two objectives, minimizing the sum of external edges and minimizing the maximal number of messages, do not coincide well in case of the biplane9 graph. As seen in figure 6, though a relative difference exists, all three norms show the same tendency what is also true for all other graphs of out test-set. Therefore we restrict the results of the latter to the l 2 -norm without omitting too much information. The summarized results can be found in table 2. These can be roughly categorized into two groups: The first one is the set of graphs that leads to similar results as obtained for the biplane9 graph. This holds for grid100, crack dual, cs4, and padfem. On the other hand, our test set contains some instances where DB finds comparable solutions concerning the edge-cut but does not deliver better partitionings when looking at the communication volume. This is the case for crack, sphere and airfoil1. The airfoil1 dual lays between these groups. If this can be explained by different graph properties or a varying performance of the heuristics is unclear. As already mentioned, the DB algorithm shows weaknesses in computing balanced partitions on some graphs. In case of the airfoil1, the largest partition contains about 5% too many vertices. On the other hand, the absolute number of excess vertices is quite small and we think that these small partition sizes partly cause the problem. An idea to address this problem is discussed in section 6. Another observation is that the standard deviation of the communication volume metrics is often smaller for DB than for the other heuristics. This can be explained by the different metric the multilevel-heuristics optimize. What cannot be seen in table 2 is the partitions shape. In all cases, only connected partitions have been computed by the DB algorithm and, if the observations from 2D graphs also hold for 3D, their shape is very compact. A notably nice example is shown in figure 7. 6 Conclusion and Further Work In this paper we have proposed a new approach to partition FEM graphs by merging the bubble framework and the first-order-diffusion-scheme. The resulting algorithm requires many computations, hence it performs slowly when executed serially, but on the other hand is very simple and contains a high degree of possible parallelism. Our results show that its solutions are comparable to those of state-of-the-art partitioning heuristics and even outperform them on most of out test graphs concerning the boundary and communication volume metrics. Thus, we think that the DB algorithm has some interesting potential. But before it can be applied in real computations, numerous questions have to be answered: From the practical point of view, the most urgent need is to decrease the algorithms run-time. There are several possible ways to archive this. During our experiments we 7

8 cut ex. edges (l 1 ) ex. edges (l 2 ) ex. edges (l ) boundary (l 1 ) boundary (l 2 ) boundary (l ) balance send (l 1 ) send (l 2 ) send (l ) receive (l 1 ) receive (l 2 ) receive (l ) Figure 6. Detailed results obtained for the biplane9 graph. Results are shown (from left to right) for: kmetis (dark blue triangle), pmetis (light blue triangles), Jostle (squares) with 0% (yellow), 1% (orange) and 3% (red) imbalance allowance, Party (green diamonds), Party (black circles) and DB (magenta circles). Each bar displays the average value of 100 (10 in case of DB) independent runs with a large mark, the standard deviation of the values with a wide bar, the minimum and maximum values with thin bars and the result for the first run (on the original, not randomized instance of the graph [3]) with a small mark. 8

9 Table 2. Overview of some average results and the standard deviation (±) regarding the l 2 -norm. The default imbalance allowance of kmetis and Jostle is 3% while party uses 1%. Note that the DB algorithm does not provide any explicit balancing option. graph partitioner global cut balance l 2 norm ex. edges boundary send receive kmetis ± ± ± ± ± ± 6.4 grid100 Jostle ± ± ± ± ± ± 5.9 Party ± ± ± ± ± ± 1.9 DB ± ± ± ± ± ± 3.9 kmetis ± ± ± ± ± ± 4.3 airfoil1 Jostle ± ± ± ± ± ± 6.4 Party ± ± ± ± ± ± 3.3 DB ± ± ± ± ± ± 4.4 kmetis ± ± ± ± ± ± 6.2 airfoil1 Jostle ± ± ± ± ± ± 7.1 dual Party ± ± ± ± ± ± 4.4 DB ± ± ± ± ± ± 4.0 kmetis ± ± ± ± ± ± 9.1 biplane9 Jostle ± ± ± ± ± ± 14.2 Party ± ± ± ± ± ± 7.0 DB ± ± ± ± ± ± 11.3 kmetis ± ± ± ± ± ± 8.0 crack Jostle ± ± ± ± ± ± 7.0 Party ± ± ± ± ± ± 5.3 DB ± ± ± ± ± ± 4.2 kmetis ± ± ± ± ± ± 8.5 crack Jostle ± ± ± ± ± ± 8.5 dual Party ± ± ± ± ± ± 7.0 DB ± ± ± ± ± ± 6.1 kmetis ± ± ± ± ± ± 17.1 cs4 Jostle ± ± ± ± ± ± 17.8 Party ± ± ± ± ± ± 20.1 DB ± ± ± ± ± ± 12.6 kmetis ± ± ± ± ± ± 14.8 padfem Jostle ± ± ± ± ± ± 14.7 Party ± ± ± ± ± ± 12.2 DB ± ± ± ± ± ± 8.7 kmetis ± ± ± ± ± ± 8.2 sphere Jostle ± ± ± ± ± ± 13.3 Party ± ± ± ± ± ± 6.4 DB ± ± ± ± ± ± 7.7 have observed that using the CPU s cache properly can lead to an up to 5 times faster execution. Furthermore, the parallelism of the algorithm should be exploited on both, shared and distributed memory machines, as well as on combinations thereof. Though this will definitely shorten the execution time, we doubt that this already allows to partition FEM graphs containing several million vertices. The computations performed inside the FOS iterations are simple. Hence, another possibility to speed these up is the use of dedicated hardware different to the main CPUs. Another point to be addressed is the slow propagation of over/underweight between the partitions. Many iterations are required until a high load is recognized at the other end of the graph, especially if many well balanced partitions are in the way. The integration of an additional explicit load balancing component could help to improve this process and also take care of those cases where the implicit mechanism does not work properly. 9

10 Furthermore, it has been shown that FOS is also applicable to weighted graphs. Thus, the bubble framework can be merged with the multilevel-approach. First experiments have been successful, but in some cases the partitions in lower levels converge to different constellations than they do in higher ones, making the additional effort useless. We think that better parameters and an adopted graph coarsening strategy might fix this problem. The question of good parameters is also of theoretical interest. In the experiments presented, we set the number of iterations to 4 while performing 25 loops and FOS iterations, respectively. This works well for the graphs included in our test-set, all containing not much more than vertices and FEM originated. But even for these graphs the chosen constants are definitely not optimal and we would expect parameters to depend on the graphs properties. A better understanding here would also answer the question on what kinds of graphs the DB algorithm is applicable. Another important question to be solved is the metric that the DB algorithm optimizes. From our experiments we can see that it might be closely related to the local communication volume. Also, the partition shapes are very compact (e.g. figure 7). However, much more work addressing this question has to be done. Finally, we can think of integrating DB into the stateof-the-art partitioning libraries, either as pre- or postprocessor or as a global partitioner in the lower levels of the multilevel-schemes improving the overall otherwise difficult partition placement. References [1] G. Karypis and V. Kumar, Metis, user Manual, Version 4.0. [Online]. Available: www-users.cs.umn.edu/ karypis/metis/ metis/files/manual.ps [2] M. Cross and C. Walshaw, Parallel optimisation algorithms for multilevel mesh partitioning, Parallel Computing, vol. 26, no. 12, pp , [3] S. Schamberger, Improvements to the helpful-set heuristic and a new evaluation scheme for graphs-partitioners, in International Conference on Computational Science and its Applications, ICCSA 03, ser. LNCS, no. 2667, 2003, pp [4] B. Hendrickson and R. Leland, A multi-level algorithm for partitioning graphs, in Supercomputing 95. ACM/IEEE Press, [5] B. W. Kernighan and S. Lin, An efficient heuristic for partitioning graphs, Bell Systems Technical Journal, vol. 49, pp , [6] G. Karypis and V. Kumar, Parallel multilevel k-way partitioning scheme for irregular graphs, in Supercomputing 96. ACM/IEEE Press, 1996, p. 35. Figure 7. The 16 partitions of a 100x100 grid after 25 iterations. [7] K. Devine and B. Hendrickson, Dynamic load balancing in computational mechanics, Computational Methods in Applied Mechanical Engineering, vol. 184, pp , [8] B. Hendrickson, Graph partitioning and parallel solvers: Has the emperor no clothes? in Irregular 98, ser. LNCS, no. 1457, 1998, pp [9] R. Biswas and L. Oliker, PLUM: Parallel load balancing for adaptive unstructured meshes, Parallel and Distributed Computing, vol. 51, no. 2, pp , [10] R. Diekmann, R. Preis, F. Schlimbach, and C. Walshaw, Shape-optimized mesh partitioning and load balancing for parallel adaptive FEM, Parallel Computing, vol. 26, pp , [11] S. Schamberger and J. M. Wierum, Graph partitioning in scientific simulations: Multilevel schemes vs. space-filling curves, in Parallel Computing Technologies, PACT 03, ser. LNCS, no. 2763, 2003, pp [12] G. Cybenko, Load balancing for distributed memory multiprocessors, Parallel and Distributed Computing, vol. 7, pp , [13] R. Elsässer, B. Monien, and R. Preis, Diffusion schemes for load balancing on heterogeneous networks, Theory of Computing Systems, vol. 35, pp , [14] R. Elsässer, B. Monien, and S. Schamberger, Toward optimal diffusion matrices, in International Parallel and Distributed Processing Symposium, IPDPS 02, 2002, p. 67 (CD). [15] R. Diekmann, A. Frommer, and B. Monien, Efficient schemes for nearest neighbor load balancing, Parallel Computing, vol. 25, no. 7, pp ,

Heuristic Graph Bisection with Less Restrictive Balance Constraints

Heuristic Graph Bisection with Less Restrictive Balance Constraints Heuristic Graph Bisection with Less Restrictive Balance Constraints Stefan Schamberger Fakultät für Elektrotechnik, Informatik und Mathematik Universität Paderborn Fürstenallee 11, D-33102 Paderborn schaum@uni-paderborn.de

More information

A Parallel Shape Optimizing Load Balancer

A Parallel Shape Optimizing Load Balancer A Parallel Shape Optimizing Load Balancer Henning Meyerhenke and Stefan Schamberger Universität Paderborn, Fakultät für Elektrotechnik, Informatik und Mathematik Fürstenallee 11, D-33102 Paderborn {henningm

More information

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Kirk Schloegel, George Karypis, and Vipin Kumar Army HPC Research Center Department of Computer Science and Engineering University

More information

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel

More information

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example

More information

Lecture 19: Graph Partitioning

Lecture 19: Graph Partitioning Lecture 19: Graph Partitioning David Bindel 3 Nov 2011 Logistics Please finish your project 2. Please start your project 3. Graph partitioning Given: Graph G = (V, E) Possibly weights (W V, W E ). Possibly

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

Native mesh ordering with Scotch 4.0

Native mesh ordering with Scotch 4.0 Native mesh ordering with Scotch 4.0 François Pellegrini INRIA Futurs Project ScAlApplix pelegrin@labri.fr Abstract. Sparse matrix reordering is a key issue for the the efficient factorization of sparse

More information

Multilevel Graph Partitioning

Multilevel Graph Partitioning Multilevel Graph Partitioning George Karypis and Vipin Kumar Adapted from Jmes Demmel s slide (UC-Berkely 2009) and Wasim Mohiuddin (2011) Cover image from: Wang, Wanyi, et al. "Polygonal Clustering Analysis

More information

Requirements of Load Balancing Algorithm

Requirements of Load Balancing Algorithm LOAD BALANCING Programs and algorithms as graphs Geometric Partitioning Graph Partitioning Recursive Graph Bisection partitioning Recursive Spectral Bisection Multilevel Graph partitioning Hypergraph Partitioning

More information

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 Technical Report

More information

MULTILEVEL OPTIMIZATION OF GRAPH BISECTION WITH PHEROMONES

MULTILEVEL OPTIMIZATION OF GRAPH BISECTION WITH PHEROMONES MULTILEVEL OPTIMIZATION OF GRAPH BISECTION WITH PHEROMONES Peter Korošec Computer Systems Department Jožef Stefan Institute, Ljubljana, Slovenia peter.korosec@ijs.si Jurij Šilc Computer Systems Department

More information

Engineering Multilevel Graph Partitioning Algorithms

Engineering Multilevel Graph Partitioning Algorithms Engineering Multilevel Graph Partitioning Algorithms Peter Sanders, Christian Schulz Institute for Theoretical Computer Science, Algorithmics II 1 Nov. 10, 2011 Peter Sanders, Christian Schulz Institute

More information

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Technical Report of ADVENTURE Project ADV-99-1 (1999) PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS Hiroyuki TAKUBO and Shinobu YOSHIMURA School of Engineering University

More information

Penalized Graph Partitioning for Static and Dynamic Load Balancing

Penalized Graph Partitioning for Static and Dynamic Load Balancing Penalized Graph Partitioning for Static and Dynamic Load Balancing Tim Kiefer, Dirk Habich, Wolfgang Lehner Euro-Par 06, Grenoble, France, 06-08-5 Task Allocation Challenge Application (Workload) = Set

More information

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania

More information

The Potential of Diffusive Load Balancing at Large Scale

The Potential of Diffusive Load Balancing at Large Scale Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gößner, Wolfgang

More information

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids

More information

Application of Fusion-Fission to the multi-way graph partitioning problem

Application of Fusion-Fission to the multi-way graph partitioning problem Application of Fusion-Fission to the multi-way graph partitioning problem Charles-Edmond Bichot Laboratoire d Optimisation Globale, École Nationale de l Aviation Civile/Direction des Services de la Navigation

More information

Engineering Multilevel Graph Partitioning Algorithms

Engineering Multilevel Graph Partitioning Algorithms Engineering Multilevel Graph Partitioning Algorithms Manuel Holtgrewe, Vitaly Osipov, Peter Sanders, Christian Schulz Institute for Theoretical Computer Science, Algorithmics II 1 Mar. 3, 2011 Manuel Holtgrewe,

More information

The JOSTLE executable user guide : Version 3.1

The JOSTLE executable user guide : Version 3.1 The JOSTLE executable user guide : Version 3.1 Chris Walshaw School of Computing & Mathematical Sciences, University of Greenwich, London, SE10 9LS, UK email: jostle@gre.ac.uk July 6, 2005 Contents 1 The

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING

CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING 6.1 Introduction From the review, it is studied that the min cut k partitioning problem is a fundamental partitioning

More information

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN, VOL XX, NO. XX, 2005 1 Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization Navaratnasothie Selvakkumaran and

More information

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions

Data Partitioning. Figure 1-31: Communication Topologies. Regular Partitions Data In single-program multiple-data (SPMD) parallel programs, global data is partitioned, with a portion of the data assigned to each processing node. Issues relevant to choosing a partitioning strategy

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

Think Locally, Act Globally: Highly Balanced Graph Partitioning

Think Locally, Act Globally: Highly Balanced Graph Partitioning Think Locally, Act Globally: Highly Balanced Graph Partitioning Peter Sanders, Christian Schulz Karlsruhe Institute of Technology, Karlsruhe, Germany {sanders, christian.schulz}@kit.edu Abstract. We present

More information

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING

AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING AN APPROACH FOR LOAD BALANCING FOR SIMULATION IN HETEROGENEOUS DISTRIBUTED SYSTEMS USING SIMULATION DATA MINING Irina Bernst, Patrick Bouillon, Jörg Frochte *, Christof Kaufmann Dept. of Electrical Engineering

More information

Multi-Threaded Graph Partitioning

Multi-Threaded Graph Partitioning Multi-Threaded Graph Partitioning Dominique LaSalle and George Karypis Department of Computer Science & Engineering University of Minnesota Minneapolis, Minnesota 5555, USA {lasalle,karypis}@cs.umn.edu

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods

Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods Meshlization of Irregular Grid Resource Topologies by Heuristic Square-Packing Methods Uei-Ren Chen 1, Chin-Chi Wu 2, and Woei Lin 3 1 Department of Electronic Engineering, Hsiuping Institute of Technology

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00 PRLLEL SPRSE HOLESKY FTORIZTION J URGEN SHULZE University of Paderborn, Department of omputer Science Furstenallee, 332 Paderborn, Germany Sparse matrix factorization plays an important role in many numerical

More information

Study and Implementation of CHAMELEON algorithm for Gene Clustering

Study and Implementation of CHAMELEON algorithm for Gene Clustering [1] Study and Implementation of CHAMELEON algorithm for Gene Clustering 1. Motivation Saurav Sahay The vast amount of gathered genomic data from Microarray and other experiments makes it extremely difficult

More information

Lecture 27: Fast Laplacian Solvers

Lecture 27: Fast Laplacian Solvers Lecture 27: Fast Laplacian Solvers Scribed by Eric Lee, Eston Schweickart, Chengrun Yang November 21, 2017 1 How Fast Laplacian Solvers Work We want to solve Lx = b with L being a Laplacian matrix. Recall

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Analysis of Multilevel Graph Partitioning

Analysis of Multilevel Graph Partitioning A short version of this paper appears in Supercomputing 995 The algorithms described in this paper are implemented by the METIS: Unstructured Graph Partitioning and Sparse Matrix Ordering System. METIS

More information

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA Partitioning and Partitioning Tools Tim Barth NASA Ames Research Center Moffett Field, California 94035-00 USA 1 Graph/Mesh Partitioning Why do it? The graph bisection problem What are the standard heuristic

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel CS 267 Applications of Parallel Computers Lecture 23: Load Balancing and Scheduling James Demmel http://www.cs.berkeley.edu/~demmel/cs267_spr99 CS267 L23 Load Balancing and Scheduling.1 Demmel Sp 1999

More information

arxiv: v3 [cs.dm] 12 Jun 2014

arxiv: v3 [cs.dm] 12 Jun 2014 On Maximum Differential Coloring of Planar Graphs M. A. Bekos 1, M. Kaufmann 1, S. Kobourov, S. Veeramoni 1 Wilhelm-Schickard-Institut für Informatik - Universität Tübingen, Germany Department of Computer

More information

Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical Simulations

Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical Simulations Shape Optimizing Load Balancing for MPI-Parallel Adaptive Numerical Simulations Henning Meyerhenke Abstract. Load balancing is important for the efficient execution of numerical simulations on parallel

More information

PULP: Fast and Simple Complex Network Partitioning

PULP: Fast and Simple Complex Network Partitioning PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November

More information

Models of distributed computing: port numbering and local algorithms

Models of distributed computing: port numbering and local algorithms Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

Lecture 2 September 3

Lecture 2 September 3 EE 381V: Large Scale Optimization Fall 2012 Lecture 2 September 3 Lecturer: Caramanis & Sanghavi Scribe: Hongbo Si, Qiaoyang Ye 2.1 Overview of the last Lecture The focus of the last lecture was to give

More information

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2) ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many

More information

Automatic Cluster Number Selection using a Split and Merge K-Means Approach

Automatic Cluster Number Selection using a Split and Merge K-Means Approach Automatic Cluster Number Selection using a Split and Merge K-Means Approach Markus Muhr and Michael Granitzer 31st August 2009 The Know-Center is partner of Austria's Competence Center Program COMET. Agenda

More information

Algorithms for Euclidean TSP

Algorithms for Euclidean TSP This week, paper [2] by Arora. See the slides for figures. See also http://www.cs.princeton.edu/~arora/pubs/arorageo.ps Algorithms for Introduction This lecture is about the polynomial time approximation

More information

A new edge selection heuristic for computing the Tutte polynomial of an undirected graph.

A new edge selection heuristic for computing the Tutte polynomial of an undirected graph. FPSAC 2012, Nagoya, Japan DMTCS proc. (subm.), by the authors, 1 12 A new edge selection heuristic for computing the Tutte polynomial of an undirected graph. Michael Monagan 1 1 Department of Mathematics,

More information

Parallel static and dynamic multi-constraint graph partitioning

Parallel static and dynamic multi-constraint graph partitioning CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2002; 14:219 240 (DOI: 10.1002/cpe.605) Parallel static and dynamic multi-constraint graph partitioning Kirk Schloegel,,

More information

A CSP Search Algorithm with Reduced Branching Factor

A CSP Search Algorithm with Reduced Branching Factor A CSP Search Algorithm with Reduced Branching Factor Igor Razgon and Amnon Meisels Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, 84-105, Israel {irazgon,am}@cs.bgu.ac.il

More information

Graphs. The ultimate data structure. graphs 1

Graphs. The ultimate data structure. graphs 1 Graphs The ultimate data structure graphs 1 Definition of graph Non-linear data structure consisting of nodes & links between them (like trees in this sense) Unlike trees, graph nodes may be completely

More information

Load Balancing Myths, Fictions & Legends

Load Balancing Myths, Fictions & Legends Load Balancing Myths, Fictions & Legends Bruce Hendrickson Parallel Computing Sciences Dept. 1 Introduction Two happy occurrences.» (1) Good graph partitioning tools & software.» (2) Good parallel efficiencies

More information

Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0

Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 PARMETIS Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 George Karypis and Kirk Schloegel University of Minnesota, Department of Computer Science and Engineering Minneapolis,

More information

University of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations

University of Innsbruck. Topology Aware Data Organisation for Large Scale Simulations University of Innsbruck Institute of Computer Science Research Group DPS (Distributed and Parallel Systems) Topology Aware Data Organisation for Large Scale Simulations Master Thesis Supervisor: Herbert

More information

V4 Matrix algorithms and graph partitioning

V4 Matrix algorithms and graph partitioning V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community

More information

k-way Hypergraph Partitioning via n-level Recursive Bisection

k-way Hypergraph Partitioning via n-level Recursive Bisection k-way Hypergraph Partitioning via n-level Recursive Bisection Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke Peter Sanders, Christian Schulz January 10th, 2016 @ ALENEX 16 INSTITUTE OF

More information

Accelerated Load Balancing of Unstructured Meshes

Accelerated Load Balancing of Unstructured Meshes Accelerated Load Balancing of Unstructured Meshes Gerrett Diamond, Lucas Davis, and Cameron W. Smith Abstract Unstructured mesh applications running on large, parallel, distributed memory systems require

More information

CS 534: Computer Vision Segmentation and Perceptual Grouping

CS 534: Computer Vision Segmentation and Perceptual Grouping CS 534: Computer Vision Segmentation and Perceptual Grouping Ahmed Elgammal Dept of Computer Science CS 534 Segmentation - 1 Outlines Mid-level vision What is segmentation Perceptual Grouping Segmentation

More information

Efficient Programming of Nanowire-based Sublithographic PLAs: A Multilevel Algorithm for Partitioning Graphs

Efficient Programming of Nanowire-based Sublithographic PLAs: A Multilevel Algorithm for Partitioning Graphs Efficient Programming of Nanowire-based Sublithographic PLAs: A Multilevel Algorithm for Partitioning Graphs Vivek Rajkumar (University of Washington CSE) Contact: California Institute

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1

More information

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis Improvements in Dynamic Partitioning Aman Arora Snehal Chitnavis Introduction Partitioning - Decomposition & Assignment Break up computation into maximum number of small concurrent computations that can

More information

Space Filling Curves and Hierarchical Basis. Klaus Speer

Space Filling Curves and Hierarchical Basis. Klaus Speer Space Filling Curves and Hierarchical Basis Klaus Speer Abstract Real world phenomena can be best described using differential equations. After linearisation we have to deal with huge linear systems of

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO

A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES. A Thesis XUYANG GUO A STUDY OF LOAD IMBALANCE FOR PARALLEL RESERVOIR SIMULATION WITH MULTIPLE PARTITIONING STRATEGIES A Thesis by XUYANG GUO Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

Parallel Graph Partitioning on a CPU-GPU Architecture

Parallel Graph Partitioning on a CPU-GPU Architecture Parallel Graph Partitioning on a CPU-GPU Architecture Bahareh Goodarzi Martin Burtscher Dhrubajyoti Goswami Department of Computer Science Department of Computer Science Department of Computer Science

More information

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS

CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1 UNIT I INTRODUCTION CS6702 GRAPH THEORY AND APPLICATIONS 2 MARKS QUESTIONS AND ANSWERS 1. Define Graph. A graph G = (V, E) consists

More information

Lecture 3: Art Gallery Problems and Polygon Triangulation

Lecture 3: Art Gallery Problems and Polygon Triangulation EECS 396/496: Computational Geometry Fall 2017 Lecture 3: Art Gallery Problems and Polygon Triangulation Lecturer: Huck Bennett In this lecture, we study the problem of guarding an art gallery (specified

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

An Edge-Swap Heuristic for Finding Dense Spanning Trees

An Edge-Swap Heuristic for Finding Dense Spanning Trees Theory and Applications of Graphs Volume 3 Issue 1 Article 1 2016 An Edge-Swap Heuristic for Finding Dense Spanning Trees Mustafa Ozen Bogazici University, mustafa.ozen@boun.edu.tr Hua Wang Georgia Southern

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Quasi-Dynamic Network Model Partition Method for Accelerating Parallel Network Simulation

Quasi-Dynamic Network Model Partition Method for Accelerating Parallel Network Simulation THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 565-871 1-5 E-mail: {o-gomez,oosaki,imase}@ist.osaka-u.ac.jp QD-PART (Quasi-Dynamic network model PARTition

More information

I. Meshing and Accuracy Settings

I. Meshing and Accuracy Settings Guidelines to Set CST Solver Accuracy and Mesh Parameter Settings to Improve Simulation Results with the Time Domain Solver and Hexahedral Meshing System illustrated with a finite length horizontal dipole

More information

Graph and Hypergraph Partitioning for Parallel Computing

Graph and Hypergraph Partitioning for Parallel Computing Graph and Hypergraph Partitioning for Parallel Computing Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology June 29, 2016 Graph and hypergraph partitioning References:

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 PuLP Complex Objective Partitioning of Small-World Networks Using Label Propagation George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania State

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Piecewise-Planar 3D Reconstruction with Edge and Corner Regularization

Piecewise-Planar 3D Reconstruction with Edge and Corner Regularization Piecewise-Planar 3D Reconstruction with Edge and Corner Regularization Alexandre Boulch Martin de La Gorce Renaud Marlet IMAGINE group, Université Paris-Est, LIGM, École Nationale des Ponts et Chaussées

More information

Scalable Dynamic Adaptive Simulations with ParFUM

Scalable Dynamic Adaptive Simulations with ParFUM Scalable Dynamic Adaptive Simulations with ParFUM Terry L. Wilmarth Center for Simulation of Advanced Rockets and Parallel Programming Laboratory University of Illinois at Urbana-Champaign The Big Picture

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters

A substructure based parallel dynamic solution of large systems on homogeneous PC clusters CHALLENGE JOURNAL OF STRUCTURAL MECHANICS 1 (4) (2015) 156 160 A substructure based parallel dynamic solution of large systems on homogeneous PC clusters Semih Özmen, Tunç Bahçecioğlu, Özgür Kurç * Department

More information

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Scalable Clustering of Signed Networks Using Balance Normalized Cut Scalable Clustering of Signed Networks Using Balance Normalized Cut Kai-Yang Chiang,, Inderjit S. Dhillon The 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) Oct.

More information

Some aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)

Some aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE) Some aspects of parallel program design R. Bader (LRZ) G. Hager (RRZE) Finding exploitable concurrency Problem analysis 1. Decompose into subproblems perhaps even hierarchy of subproblems that can simultaneously

More information

New Challenges In Dynamic Load Balancing

New Challenges In Dynamic Load Balancing New Challenges In Dynamic Load Balancing Karen D. Devine, et al. Presentation by Nam Ma & J. Anthony Toghia What is load balancing? Assignment of work to processors Goal: maximize parallel performance

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

Complementary Graph Coloring

Complementary Graph Coloring International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,

More information

Segmentation and Grouping

Segmentation and Grouping Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation

More information

Welcome to the course Algorithm Design

Welcome to the course Algorithm Design Welcome to the course Algorithm Design Summer Term 2011 Friedhelm Meyer auf der Heide Lecture 13, 15.7.2011 Friedhelm Meyer auf der Heide 1 Topics - Divide & conquer - Dynamic programming - Greedy Algorithms

More information

Part 3: Image Processing

Part 3: Image Processing Part 3: Image Processing Image Filtering and Segmentation Georgy Gimel farb COMPSCI 373 Computer Graphics and Image Processing 1 / 60 1 Image filtering 2 Median filtering 3 Mean filtering 4 Image segmentation

More information

Combinatorial Maps. University of Ljubljana and University of Primorska and Worcester Polytechnic Institute. Maps. Home Page. Title Page.

Combinatorial Maps. University of Ljubljana and University of Primorska and Worcester Polytechnic Institute. Maps. Home Page. Title Page. Combinatorial Maps Tomaz Pisanski Brigitte Servatius University of Ljubljana and University of Primorska and Worcester Polytechnic Institute Page 1 of 30 1. Maps Page 2 of 30 1.1. Flags. Given a connected

More information