Eu = {n1, n2} n1 n2. u n3. Iu = {n4} gain(u) = 2 1 = 1 V 1 V 2. Cutset

Shantanu Dutt 1 and Wenyong Deng 2 A Probability-Based Approach to VLSI Circuit Partitioning Department of Electrical Engineering 1 of Minnesota University Minneapolis, Minnesota 55455 LSI Logic Corporation 2 CA 95035 Milpitas, 1

The New Probability-Based Partitioner (PROP) Potential Node Gain Computation OUTLINE Problem Denition Previous Partitioning Methods Previous Iterative Improvement Methods Probability Computation Results Conclusions 2

Problem Denition: Given a netlist G representing a VLSI circuit, its nodes into two sets meeting size constraints s.t. the partition Results in densely connected modules being physically clustered on chip. Many short wires and fewer long wires, thus minimizing the Problem Denition: Min-Cut Partitioning cost of wires between the two partitions (the cutset) is minimized To layout a VLSI circuit, recursively do min-cut partitioning into 2 until there are 2 nodes in each partition. Cut levels alter- halves, along X and Y dimensions nate 1 2 3 3 5 6 3 4 7 8 2 2 9 10 13 14 11 12 15 16 1 3 3 wire area 3

Iterative-improvement min-cut partitioning: Good, fast [Kernighan & Lin, Bell Syst. J., Feb. 1970] (KL), Examples, Our work has transformed this approach to fvery good, fastg Clustering followed by min-cut partitioning: Very good, slow [Wei & Cheng, ICCAD-89] (ratio-cut), [Hagen & Examples, Simulated annealing and Genetic algorithm: Very good, very slow Proc. of DAC, 1988], [Saab & Rao, Proc. Examples:[Sechen, Numerical optimization: Medium, slow [Mogaki et al., ICCAD, 1987]. More recently, [Riess, Examples: VLSI Partitioning/Placement Methodologies & Mattheyses, Proc. DAC, 1982] (FM), and [Krishnamurthy, [Fidducia IEEE Trans. Comput., May 1984] (LA). IEEE Trans. CAD, Sept. 1992] (EIG1), [Alpert and Kahng, ICCAD 94] (WINDOW), [Alpert and Yao, DAC 95] Kahng, (MELO). DAC, 1989], [Shahookar & Mazumder, IEEE Trans. CAD, 1990] and Johannes, DAC 94] (Paraboli), has put this approach in Doll good, slowg fvery In [Shahookar & Mazumder, ACM Comp. Surv., June 91]: min-cut partitioning is the most cost- \Iterative-improvement eective method" 4

Eu is the set of cutset nets connected only to u in V1 Iu is the set of nets connected to u that are not in the cutset The gain can be positive or negative Iterative-Improvement Algorithms: The KL/FM Algorithm The gain of a node u (say, in V1) is dened as X gain(u) := c(n i ) ; X c(n j ) n i 2Eu n j 2Iu Eu = {n1, n2} Iu = {n4} gain(u) = 2 1 = 1 n1 n2 u n3 n4 V 2 V 1 5

The KL/FM Algorithm (Contd.) 1. Generate an initial partition Pick best \unlocked" node among both subsets to move if the 2. condition (egs., 45-55%) is met. Otherwise, pick best balance unlocked node to move from the other subset 3. Tentatively move and lock the node 4. Update gains of the neighbors of swapped node 5. Repeat steps 2-4 until all nodes are locked Compute the prex sums Su's of gains of all nodes u in order of 6. Actually perform swaps till node x, s.t. Sx is the highest move. Gain of moved nodes Prefix Sum 2 0 1 2 3 2 1 5 2 4 7 2 2 2 1 4 2 1 6 4 0 7 Make actual moves till this point If Sx > 0 new partition = swapped partition repeat steps 2-6 7. new partition = old partition exit else 6

The ith element of the gain vector, 1 i k, gain(u)[i] = (# of in the cutset that are connected to i nodes in V1 including u) - nets Generally, best performance is obtained for k = 2 to 4. Memory is (p k max ). requirement Lookahead (LA) Algorithm [Krishnamurthy, IEEE Trans. The May 1984] Comput., Each node has a gain vector gain(u)[k] of node u with k elements is the degree of lookahead. Assume u 2 V1. k (# of nets in the cutset connected to u that have i;1 nodes in V2) gain(u)[1] = 2 1 = 1 n1 n2 u n3 gain(u)[2] = 2 n4 gain(u)[3] = 2 1 = 1 V 2 V 1 7

\Visual intuition", however, tells us that node 3 is the best one to followed by node 2, then node 1 move, FM and LA Comparison FM Gain Example LA Gain Example FM gain n1 2 1 n2 n9 1 4 1 5 1 6 1 7 n12 n13 n14 n15 LA gain (2,0,0) n1 1 n2 n9 4 5 6 7 n12 n13 n14 n15 n3 n4 2 2 n10 n5 n6 2 3 n11 n7 V 2 n8 V 1 1 n16 8 n17 9 1 10 1 11 1 FM gain n3 (2,0,1) 2 n4 n10 n5 n6 (2,0,1) 3 n11 n7 V 2 n8 V 1 8 9 10 11 n16 n17 LA is better than FM, but not good enough 8

Idea is to get an estimate of the potential gain of moving a node the current time. at Done by computing node gains according to the probabilities of connected nets from the cutset removing To obtain these net probabilities, we need probabilities of nodes actually moved a chicken-and-egg problem! being The PRObabilistic Partitioner (PROP) 0.5 0.6 0.8 0.5 V 2 0.7 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.7 9

Compute probabilistic gains gn i (u) corresponding to each net n i 3. to u, and then its total gain g(u) = P u2n connected gn i (u) i 4. Assign probabilities using f(g(u)) PROP: Determining Node Probabilities Either 1. Compute deterministic gains of nodes according to FM, and (a) a function f(g(u)) assign node probabilities using OR (b) Assign a xed probability of, say, 0.9, to each node 2. Iterate the following 2 steps (1 or more times): 10

PROP: Determining Node Probabilities (Contd.) 1, 0.2 n12 4 g(1), p(1) 1, 0.2 n13 5 2, 1 1, 0.2 n14 n1 6 1 n2 n9 1, 0.2 n15 7 1, 0.2 n16 n3 2, 1 8 2 n10 n4 n17 9 1, 0.2 n5 10 1, 0.8 n6 3 n11 2, 1 11 1, 0.8 n7 n8 V 2 V 1 (a) 1st Iteration.49, 0.3 n12 4 g(1), p(1).49, 0.3 n13 5.49, 0.3 2.0016, 1 n14 6 n1 1 n9.49, 0.3 n15 n2 7 0.3, 0.4 2.04, 1 n16 n3 8 2 n10 n17 n4 9 0.3, 0.4 n5 10 1.8, 0.9 n6 3 2.64, 1 n11 11 1.8, 0.9 n7 n8 V 2 V 1 (b) 2nd Iteartion 11

PROP (Contd.) The rest of the algorithm is as follows: Pick \unlocked" node with highest g(u) among both subsets to 1. if the balance condition is met. Otherwise, pick best un- move locked node to move from the other subset Tentatively move and lock the node. Note the \immediate move 2. gain" Update probabilities of nets connected to moved node, and the 3. of its neighbors gains 4. Repeat steps 1-3 until all nodes are locked Compute the prex sums Su's of gains of all nodes u in order of 5. Actually perform swaps till node x, s.t. Sx is the highest move. If Sx > 0 new partition = swapped partition repeat steps 1-5 6. new partition = old partition exit else 12

Let 2 V1 be connected to net n i, in the cutset u : i r = n i \ Vr, r = 1 2. n Node Gain Calculation: Nets In the 0.5 0.5 0.6 0.7 V 2 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.8 0.7 Gain gn i (u) is dened as: gn i (u) = (Probability of n i [1! 2] given that u has been moved) ;(Probability of n i [2! 1] given that u is not moved) Using conditional probabilities and some approximations gn i (u) c(n i )[ Qux2(n i 1 =fug) p(u x) ; Q uy2n i 2 p(uy)] Thus gn 1 (u) = 1 ; 0:25 = 0:75 gn 2 (u) = 1 ; 0:7 = 0:3 gn 3 (u) = ; 0:6 = 0:04. 0:64 13

Thus gn 4 (u) = ;(1 ; 0:14) = ;0:86 Node Gain Calculation: Net Not in 0.5 0.6 0.8 0.5 0.7 V 2 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.7 In this case, gn i (u) is intutively negative: gn i (u) = ;c(n i )(Prob: that n i remains in the cutset after u is moved) Again using conditional probabilities and approximations gn i (u) ;c(n i )(1 ; Q ux2n i 1 =fug p(u x)) 14

PROP is Not Only a Tie-Breaking Extension of FM It is a completely new gain calculation method 0.1 0.1 0.1 0.1 FM 3 0.1 0.1 0.1 0.1 1 1.98 0.99 1 FM 2 FM gain FM 0 0.5 0.7 p(u) 0.8 FM 1 1.79 g(u) FM 0 0.5 0.7 15

Need a monotonically increasing function of node gains g(u)s A caveat that works well is applying thresholding [gup (say, = 1.5), low (say, = -1)] on node gains g Probabilities of all other nodes are computed using the probability Calculating Node Probabilities function Probabilities p max Semi Gaussian Linear p min Gains g up g low 16

Updating p nets and d neighbors per moved node takes time (p + total is (nd). d) Reinsertion of each neighbors in the balanced binary search tree log n time total updation time is (nd log n) takes Time and Space Complexities Initial probability and gain calculations: O(nd) Choosing the best node takes constant time thus total of (n) entire pass for Thus time complexity of PROP is (nd log n) Space complexity is (nd) (net and node incidence lists) 17

Case 50-50% % Impr. Previous Case 45-55% % Impr. Previous Summary of Results ACM/SIGDA suite: 801 nodes, 735 nets to 12637 nodes, 13419 nets Results: Algorithm of PROP Algorithm of PROP 30 FM20 22.3 FM100 57.1 EIG1 19.9 MELO 27.3 LA2-40 16.6 LA3-20 Paraboli 15 WINDOW 25.9 Timing Results: Previous Algorithm Speedup of PROP 0.98 FM100-bucket 2.9 FM100-tree 0.99 LA2-40 2.24 LA3-20 1.53 WINDOW 0.73 EIG1 2.17 MELO 3.90 Paraboli 18

Presented a new approach PROP to min-cut partitioning using a gain calculation probablity-based Achieves very good cutsets compared to previous iterative as well other state-of-the-art schemes as It is quite fast only twice as slow per run than FM-tree. It is much than other recent state-of-the-art schemes faster Iterative-improvement type schemes need not get caught in a local if move decisions are sophisticated (i.e., capture global minima We will adapt and extend PROP to achieve the following: k-way partitioning Timing optimization of circuits FPGA and multiple-fpga/chip mapping of large systems Conclusions and Future Work info) expensive statistical methods are not always needed Many variations of basic PROP possible 19