A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs

MACRo 2015-5 th International Conference on Recent Achievements in Mechatronics, Automation, Computer Science and Robotics A Modified Inertial Method for Loop-free Decomposition of Acyclic Directed Graphs Dániel A. DREXLER 1, Péter ARATÓ 2 1 Department of Control Engineering and Information Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary, e-mail: drexler@iit.bme.hu 2 Department of Control Engineering and Information Technology, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary, e-mail: arato@iit.bme.hu Manuscript received January 12, 2015, revised February 9, 2015. Abstract: Graph decomposition is a key process in system-level synthesis, even if it is used for allocation (e.g. hardware-software partitioning) or simple decomposition as a preprocessing step (e.g. for pipelining). Acyclic graphs are usually desirable in the design processes, thus preserving the acyclicity during decomposition is crucial. We propose a modified inertial decomposition to create loop-free decomposition results. We assign coordinates to the nodes based on their maximal distance from the inputs, and give an algorithm that finds the required number of cuts in polynomial time while balancing the size of segments and looking for minimal number of edges along cuts. Keywords: loop-free decomposition, inertial method, directed acyclic graph. 1. Introduction Decomposition of graphs into segments is an important task in system-level synthesis. Decomposition is needed to partition the problem to be solved into appropriate segments and assign them to multiple processing elements or partition the problem into segments as a preprocessing step for high-level synthesis processes [1]. Decomposition of graphs has a great literature, e.g. in [2] and [3] decomposition methods and tools are presented used in computer science, along with a brief literature review. In [4], one may find theoretical results on graph decomposition. Popular decomposition algorithms are the Kernighan-Lin algorithm [5] and spectral decomposition [6,7]. Hardware-software partitioning is considered in [8,9,10].The main disadvantage of these methods is that they can handle only undirected graphs in the sense that if the input is a directed acyclic graph, they can not guarantee that the result is also acyclic. 61 10.1515/macro-2015-0006

62 D. A. Drexler, P. Arató Acyclic directed graphs are important in system-level synthesis, since the loops in cyclic directed graphs increases the complexity of system design due to the possibly unpredictable behavior of the task caused by the loops. Creating loops during decomposition thus needs to be avoided if possible [11]. A loop-free decomposition algorithm has already been proposed in [12]. This algorithm creates a cutting list based on a heuristic algorithm, and the decomposition is based on this cutting list. It starts with the output nodes being in a segment and grows the segment by taking those nodes into the segment containing the outputs that does not cause loop, thus giving the list of the possible cuts. The choice of the nodes is not unique, and this choice highly affects the result of decomposition. However, giving a good strategy for this choice is not solved yet, since the actual decomposition is done after the cutting list generation sequentially. We propose a strategy different than the one in [12], that also guarantees loop-free result, which we prove in Section 2. We assign a coordinate to each node, i.e. its maximal distance from the input nodes, and call it the depth of the node. The segmentation is done based on the depth of the nodes, the nodes are assigned to a given segment if their depths are between given numbers. We propose an algorithm that helps to find the cuts in polynomial time and also attempts to minimize a given cost function. In this case the cost function is considered during the decomposition, and not in a preprocessing step as in [12]. We demonstrate the algorithm on a graph consisting of 20 nodes with a cost function that attempts to find balanced segments, and another one that attempts to find balanced segments with minimal number of edges along the cuts. The paper ends with the conclusion is Section 3. 2. Decomposition of acyclic graphs A. Notations and terminology Suppose that the problem to be decomposed is described as a connected acyclic directed graph. Let the set of nodes of the graph be denoted by V and the set of edges by E. Let the graph have n number of nodes, and the nodes be denoted by v 1, v 2,, v n. Let e i,j = 1 if there is a directed edge between nodes v i and v j, and e i,j = 0 otherwise. Let the set of input nodes be denoted by V I, and the set of output nodes be denoted by V O. B. Inertial decomposition The inertial decomposition [13,14] is done by first assigning coordinates to the nodes, and then considering them as point masses and finding the principle

A modified inertial method for loop-free decomposition of acyclic directed graphs 63 axes that decompose the graph into balanced parts. The algorithm is usually done for two- or three-dimensional coordinates, and is used for distributing grid points for numerical methods on a multiprocessor structure. The coordinates of the nodes are thus usually coordinates of a grid on which e.g. a partial differential equation needs to be solved, and the decomposition is done based on these coordinates. This ensures that calculations on grid points being close to each other will be done in the same or adjacent processors. The main advantage of this method is that finding principle axis is usually easy. The disadvantage of this method is that assigning a coordinate is not easy if the problem to be solved is not a numerical method formulated on a grid, and decomposition based on this method does not guarantee loop-free result. C. Loop-free decomposition We propose a method to assign one-dimensional coordinates to the nodes such that loop-free decomposition result is guaranteed if the segments are created based on these coordinates. Let D(v i, v j ) denote the length of the longest directed path between the nodes v i and v j, also called the detour distance between the nodes [4]. We define the depth of a node v j as the length of the longest directed path between the input nodes and the node, i.e. d j = max v i V I D(v i, v j ). (1) Let D = max d j be the maximal depth, also called the detour diameter in [4]. j Then clearly each node has a depth between 0 and D. Thus the one-dimensional coordinate we assign to the nodes is their depth. The decomposition is done by grouping the nodes based on their depths. A cut has value m if it separates nodes having depth less than m and nodes having depth greater than or equal to m. We denote the ith cut as c i, thus c i {1,2,, D}. E.g. if c i = 3, then it separates nodes that have depth less than 3, and nodes that have depth greater than or equal to 3.If we decompose the acyclic directed graph using cuts defined this way, then the result will also be acyclic. Theorem 1: Suppose that we have a connected acyclic directed graph. Suppose that the graph is decomposed to two segments S 1 and S 2, such that nodes with depth less than a given number are in S 1 and nodes with depth greater than or equal to that number are in S 2. Then there is no directed edge from S 2 to S 1. Proof: The proof is indirect. Suppose that there is a directed edge from S 2 to S 1, which means that there is a node v k S 2 that has a directed edge to a node

64 D. A. Drexler, P. Arató v l S 1. The depth of node v k is d k as defined in (1). Since there is a directed edge from node v k to node v l, the depth of node v l (denoted by d l ) is at least d k + 1,that is a contradiction, because d l < d k must hold due to the definition of segments S 1 and S 2. Using this method, one can decompose the graph to at most D + 1 segments with a guarantee that the result is acyclic. Suppose, that we want to decompose the graph into k segments. In order to do this, we need to choose k 1 cuts, so the decomposition is characterized by the set of cuts {c 1, c 2, c k 1 } with c j > c i if j > i. If k = D + 1, then this set is trivial, i.e. it is {1,2,, D}. But how should one choose this set, if k < D + 1? If k is not fixed, then there are 2 D 1 possible solutions to choose from, while for fixed k, there are ( D )possible solutions. This shows that the k 1 problem has exponential complexity. We propose an algorithm that finds a solution in polynomial time. Figure 1: Illustration for the definition of segment S i

A modified inertial method for loop-free decomposition of acyclic directed graphs 65 We start with the set of cuts {c 1, c 2, c D }, and remove the cuts until there are k 1 cuts left. In the set of cuts we evaluate every cut by examining the segments that result after removing the cut, and contains the edges of the cut. We define the segment S i by the segment that is the result of the union of the segments that are the neighbors of the cut c i. In other words, the segment S i is the segment formed after removing the cut c i, and contains the edges along the cut c i as shown in Fig. 1. We evaluate each cut by the properties of the corresponding segment and the edges that are removed by removing the cut. In each step of the algorithm, we evaluate the current cuts, select the most beneficial cut, remove that cut from the list, form the corresponding new segment by uniting the two segments that were separated by the chosen cut, and at the next stage we redefine the cut list and the segments. We evaluate the cuts again, and remove the most beneficial (i.e. the most beneficial for removal), and repeat the process until there are only k 1 cuts left. Since we have to evaluate D cuts at the first step, D 1 cuts at the second step, D 2 cuts at the third step, and so on, and finally k cuts at the last step, there are a total of (D(D + 1) k(k + 1))/2 steps, so the algorithm is polynomial in both D and k. Require: A graph G, with n nodes with directed edges and being connected and acyclic, and having the depth assigned to each node as in (1). The desired number of segments is k and the cost function to be minimized is f. Ensure: A set of cuts that is based on the node depth and ensures acyclic result and attempts to minimize the cost function f. r: = D. The initial set of cuts is {c 1, c 2,, c D } = {1,2,, D}. while r > k 1 do 1. Create the segments {S 1, S 2,, S r } corresponding to the cuts {c 1, c 2,, c r }. 2. Evaluate each cut and choose the one that minimizes the cost function f, i.e. choose c = argmin f(c i, S i(c i )). (2) c i 3. Remove c from the set of cuts, and recalculate the list of cuts. 4. r:=r-1 end while

66 D. A. Drexler, P. Arató Figure 2: The decomposition algorithm that generates the cuts based on node depth and the cost function f. We denote the cost function that we evaluate for each cut by f and suppose that it depends only on the cut and the segment corresponding to the cut, and we also suppose that the cost function needs to be minimized. Then the decomposition algorithm is in Fig. 2. Suppose that we would like to get balanced segments with the cuts along the minimal number of edges. Suppose that the initial graph contains n nodes and we want k segments. Then the required number of nodes in a segment denoted by S in a balanced decomposition is S = n k. (3) In the cost function we want to punish if the resulting segment contains much more or much less segments than S. The cuts remaining after the algorithm should be over small number of edges, so we want to punish if a cut is over small number of edges. The cost function we use is k l ( S i S 1 ) + k, if e c f(c i, S i(c i )) = { i S i S k g (S 1 (4) S i ) + k, if e c i S i > S where S i is the number of edges in the segment S i (the cardinality of S i), while c i is the number of edges along the cut c i (we call it the cardinality of c i ). The notation S i(c i ) is used to emphasize that the segment S i depends on the cut c i. Parameter k l is used to punish if the number of nodes in the resulting segment is less than S, while parameters k g and k e are used to punish if the number of nodes in the resulting segment is greater than S and the number of edges along the cut is small respectively. D. Example Decomposition of a graph composed of 20 nodes Consider the example in Fig. 3. The graph in the figure is an acyclic directed graph with 20 nodes, and with maximal depth D = 7. The task is to decompose the graph into an acyclic graph composed of three segments. The resulting segments will be denoted by S 1, S 2 and S 3, and will be defined by the set of nodes contained in them. The decomposition will be done using the algorithm in Fig. 2 and the cost function (4) with different parameter settings. We will show the first step of the algorithm in the first cycle, i.e. give the list of cuts and their corresponding segments in the first cycle, and we give the cardinalities of the segments and the cuts.

A modified inertial method for loop-free decomposition of acyclic directed graphs 67 The cuts at the first cycle of the algorithm are denoted as dashed horizontal lines in Fig. 3. The segments corresponding to the cuts are S 1 = {v 1, v 2, v 3, v 4, v 5, v 6, v 7 } S 2 = {v 5, v 6, v 7, v 8, v 9, v 10 } S 3 = {v 8, v 9, v 10, v 11, v 12, v 13 } S 4 = {v 11, v 12, v 13, v 14 } S 5 = {v 14, v 15, v 16 } S 6 = {v 15, v 16, v 17, v 18, v 19 } S 7 = {v 17, v 18, v 19, v 20 }. The cardinalities of the segments (number of nodes in a segment) and the cardinalities of the cuts (number of edges along a cut) are S 1 = 7, S 2 = 6, S 3 = 6, S 4 = 4, S 5 = 3, S 6 = 5, S 7 = 4and c 1 = 6, c 2 = 5, c 3 = 3, c 4 = 3, c 5 = 2, c 6 = 3, c 7 = 3. The total number of nodes is 20, and since we want to have three segments, the ideal number of nodes in a segment in a balanced decomposition is S = 20/3. First, the algorithm in Fig. 2 was run with the parameters k l = 0.1, k g = 0.5 and k e = 0. Since k e = 0, the algorithm does not consider the number of edges along the cut, its goal is to find balanced segments only. The result of the algorithm is the decomposition in Fig. 4, with segments S 1 = {v 1, v 2, v 3, v 4 } S 2 = {v 5, v 6, v 7, v 8, v 9, v 10, v 11, v 12, v 13 } S 3 = {v 14, v 15, v 16, v 17, v 18, v 19, v 20 }. The resulting cuts are c 1 = 1 and c 2 = 4 and their cardinalities are c 1 = 6 and c 2 = 3. Second, the algorithm in Fig. 2 was run with the parameters k l = 0.1, k g = 0.5 and k e = 10. In this case the number of edges along the cuts are also considered. The result of the decomposition is in Fig. 5, with the segments being S 1 = {v 1, v 2, v 3, v 4, v 5, v 6, v 7, v 8, v 9, v 10 } S 2 = {v 11, v 12, v 13, v 14 } S 3 = {v 15, v 16, v 17, v 18, v 19, v 20 }.

68 D. A. Drexler, P. Arató The resulting cuts are c 1 = 3 and c 2 = 5 and their cardinalities are c 1 = 3 and c 2 = 2. In the second case, the algorithm chose to unite the nodes close to the inputs, since there are lots of edges between these nodes. In the first case, the algorithm did not consider the number of edges, so it placed the first cut along six edges, right after the input. The segments are similarly balanced in both cases, however the number of edges along the cuts is obviously lower in the second case. Figure 3:An example acyclic directed graph with 20 nodes, each node has its depth

A modified inertial method for loop-free decomposition of acyclic directed graphs 69 already assigned, and the dashed lines are the cuts in the first cycle of the algorithm in Fig. 2.

70 D. A. Drexler, P. Arató Figure 4:The result of the decomposition algorithm in Fig. 2 applied to the graph in Fig. 3 with parameters k l = 0. 1, k g = 0. 5, k e = 0.

A modified inertial method for loop-free decomposition of acyclic directed graphs 71 Figure 5:The result of the decomposition algorithm in Fig. 2 applied to the graph in Fig. 3 with parameters k l = 0. 1, k g = 0. 5, k e = 10. 3. Conclusion We have presented a decomposition strategy based on the depth of the nodes (their detour distance from the input nodes) that guarantees loopfree decomposition results. This method does not parameterize all the possible acyclic decompositions; however it gives good initial solutions that can be further manipulated with other algorithms, e.g. using an appropriately modified Kernighan-Lin algorithm. We have proposed an algorithm that selects the desired number of cuts using the presented strategy. The algorithm solves the problem in polynomial time while attempting to minimize a cost function assigned to the cuts. Note that the cost function used in the algorithm gives local information about the graph, since it is based on the environment of the current cut, and does not characterize the result globally. Thus the result is not guaranteed to be optimal, but it can serve as an initial decomposition for further optimization algorithms. Acknowledgements The research work presented in this paper has been supported by the Hungarian Scientific Research Fund OTKA 72611, by the "Research University Project" TAMOP IKT T5 P3 and the research project TAMOP-4.2.2.C-11/1/KONV- 2012-0004. References [1] Arató, P., Visegrády, T., and Jankovits, I., High-level Synthesis of Pipelined Datapaths, John Wiley & Sons, 2001. [2] Hendrickson, B., and Leland, R., The Chaco User's Guide: Version 2.0, Sandia Tech Report SAND94--2692, 1994. [3] Karypis, G. and Kumar, V., hmetis - A Hypergraph Partitioning Pachage - Version 1.5.3, University of Minnesota, Depratment of Computer Science &Engineering, Minneapolis, USA, 1988. [4] Arumugam, S., Sahul Hamid, I., and Abraham, V. M., Decomposition of Graphs into Paths and Cycles, Journal of Discrete Mathematics, Hindawi Publishing Corporation, pp. 1.6, 2013. [5] Kernighan, B. and Lin, S., An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, 29l, pp. 291-307, 1970.

72 D. A. Drexler, P. Arató [6] Hendrickson, B., and Leland, R., Multidimensional Spectral Load Balancing, SAND93-0074, Sandia National Laboratories, Albuquerque, NM, USA, 1993. [7] Leland, R., and Hendrickson, B., An empirical study of static load balancing algorithms, in Proceedings of the IEEE Scalable High-Performance Computing Conference, pp. 682-685, 1994. [8] Sasaki, S., Nishihara, T., Ando, D., and Fujita, M., "Hardware/software co-design and verification methodology from system level based on system dependence graph", Journal of Universal Computer Science, vol. 13, no. 13, pp. 1972-2001, 2007. [9] Arató, P., Mann, Z. A., and Orbán, A., "Algotihmic aspects of hardware/software partitioning", ACM Transactions on Design Automation Electronic Systems, vol. 10, no. 1, pp. 136-156, 2005. [10] Purnaprajna, M., Reformat, M., and Pedrycz, W., "Genetic algorithms for hardwaresoftware partitioning and optimal resouce allocation", Journal of Systems Architecture, vol. 53., no. 7., pp. 339-354, 2007. [11] Hou, J., and Wolf, W., Process partitioning for distributed embedded systems, in Proceedings of the 4th International Workshop on Hardware/software Co-Design, ser. CODES '96., Washington, DC, USA: IEEE Computer Society, pp. 70-76, 1996. [12] Arató, P., Drexler, D. A., and Kocza, G., A Method for Avoiding Loops while Decomposing the Task Description Graph in System-Level Synthesis, in Proceedings of the 2014 IEEE 9th International Symposium on Applied Computational Intelligence and Informatics, Timisoara, Romania, pp. 231 235, 2014. [13] Simon, H. D., "Partitioning of unstructured problems for parallel processing", Computer System Engineering, 2, pp. 135-148, 1991. [14] Williams, R., "Performance of dynamics load balancing algorithms for unstructured mesh calculations", Concurrency, 3, pp. 457-481, 1991.