Eu = {n1, n2} n1 n2. u n3. Iu = {n4} gain(u) = 2 1 = 1 V 1 V 2. Cutset

Similar documents
VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques

Large Scale Circuit Partitioning

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Unit 5A: Circuit Partitioning

A HYBRID MULTILEVEL/GENETIC APPROACH FOR CIRCUIT PARTITIONING. signicantly better than random or \rst-in-rst-out" tiebreaking

VLSI Physical Design: From Graph Partitioning to Timing Closure

Research Article Accounting for Recent Changes of Gain in Dealing with Ties in Iterative Methods for Circuit Partitioning

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

BACKEND DESIGN. Circuit Partitioning

The Partitioning Problem

Implementation of Multi-Way Partitioning Algorithm

An Effective Algorithm for Multiway Hypergraph Partitioning

CIRCUIT PARTITIONING is a fundamental problem in

Acyclic Multi-Way Partitioning of Boolean Networks

Using Analytical Placement Techniques. Technical University of Munich, Munich, Germany. depends on the initial partitioning.

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

CAD Algorithms. Circuit Partitioning

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

EE244: Design Technology for Integrated Circuits and Systems Outline Lecture 9.2. Introduction to Behavioral Synthesis (cont.)

L14 - Placement and Routing

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Genetic Algorithm for Circuit Partitioning

A Linear-Time Heuristic for Improving Network Partitions

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

A New K-Way Partitioning Approach. Bernhard M. Riess, Heiko A. Giselbrecht, and Bernd Wurth. Technical University of Munich, Munich, Germany

A Recursive Coalescing Method for Bisecting Graphs

Optimality, Scalability and Stability Study of Partitioning and Placement Algorithms

Shantanu Dutt. P 2, so that the total cost of the edges between P 1. and P 2 is minimized. More specically, let G(V; E)

Estimation of Wirelength

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 7, JULY

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Place and Route for FPGAs

algorithms, i.e., they attempt to construct a solution piece by piece and are not able to offer a complete solution until the end. The FM algorithm, l

Hardware Software Partitioning of Multifunction Systems

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Akey driver for hypergraph partitioning research in VLSI CAD has been the top-down global placement of standard-cell designs. Key attributes of real-w

High Level Synthesis

Multilevel k-way Hypergraph Partitioning

Introduction VLSI PHYSICAL DESIGN AUTOMATION

On Improving Recursive Bipartitioning-Based Placement

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

A New 2-way Multi-level Partitioning Algorithm*

Iterative Partitioning with Varying Node Weights*

Hypergraph Partitioning With Fixed Vertices

Standard FM MBC RW-ST. Benchmark Size Areas Net cut Areas Net cut Areas Net cut

Multi-way Netlist Partitioning into Heterogeneous FPGAs and Minimization of Total Device Cost and Interconnect

Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

ECE 5745 Complex Digital ASIC Design Topic 13: Physical Design Automation Algorithms

Partitioning. Partitioning Levels. Chip. Readings: Chapter 2. Circuits can exceed chip capacity. Partitioning

Multilevel Graph Partitioning

Hardware-Software Codesign

Local Unidirectional Bias for Smooth Cutsize-Delay Tradeoff in Performance-Driven Bipartitioning

Combining Problem Reduction and Adaptive Multi-Start: A New Technique for Superior Iterative Partitioning. Abstract

A Stochastic Search Technique for Graph Bisection

Multi-Resource Aware Partitioning Algorithms for FPGAs with Heterogeneous Resources

K partitioning of Signed or Weighted Bipartite Graphs

Can Recursive Bisection Alone Produce Routable Placements?

Charles J. Alpert, Member, IEEE, Jen-Hsin Huang, Member, IEEE, and Andrew B. Kahng, Associate Member, IEEE

Placement Algorithm for FPGA Circuits

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1

Parallel Implementation of VLSI Gate Placement in CUDA

2 A. E. Caldwell, A. B. Kahng and I. L. Markov based, mathematical programming-based, etc. approaches, is given in a comprehensive survey [5] by Alper

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016

Wojciech P. Maly Department of Electrical and Computer Engineering Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA

Circuit Placement: 2000-Caldwell,Kahng,Markov; 2002-Kennings,Markov; 2006-Kennings,Vorwerk

Tree Structure and Algorithms for Physical Design

Mincut Placement with FM Partitioning featuring Terminal Propagation. Brett Wilson Lowe Dantzler

A General Framework for Vertex Orderings, With Applications to Netlist Clustering. C. J. Alpert and A. B. Kahng

Efficient FM Algorithm for VLSI Circuit Partitioning

Visual Representations for Machine Learning

THE ISPD98 CIRCUIT BENCHMARK SUITE

AN ACCELERATOR FOR FPGA PLACEMENT

ECO-system: Embracing the Change in Placement

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Partitioning. Partition the problem into parts such that each part can be solved separately. 1/22

Fast Timing-driven Partitioning-based Placement for Island Style FPGAs

Requirements of Load Balancing Algorithm

Research Incubator: Combinatorial Optimization. Dr. Lixin Tao December 9, 2003

PARTITIONING-BASED STANDARD-CELL GLOBAL PLACEMENT. fv1;v2;:::;v ng and E = fe1;e2;:::;emg, then vertices correspond

Faster Placer for Island-style FPGAs

Binary Decision Diagram with Minimum Expected Path Length

Lesson 2 7 Graph Partitioning

CAD Flow for FPGAs Introduction

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Local Search and Optimization Chapter 4. Mausam (Based on slides of Padhraic Smyth, Stuart Russell, Rao Kambhampati, Raj Rao, Dan Weld )

Problem Definition. Clustering nonlinearly separable data:

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

Abacus: Fast Legalization of Standard Cell Circuits with Minimal Movement

N-Queens problem. Administrative. Local Search

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Key terms and concepts: Divide and conquer system partitioning floorplanning chip planning placement routing global routing detailed routing

Variation Tolerant Buffered Clock Network Synthesis with Cross Links

Slicing Floorplan With Clustering Constraint

Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems

Don't Cares in Multi-Level Network Optimization. Hamid Savoj. Abstract

CprE 583 Reconfigurable Computing

Transcription:

Shantanu Dutt 1 and Wenyong Deng 2 A Probability-Based Approach to VLSI Circuit Partitioning Department of Electrical Engineering 1 of Minnesota University Minneapolis, Minnesota 55455 LSI Logic Corporation 2 CA 95035 Milpitas, 1

The New Probability-Based Partitioner (PROP) Potential Node Gain Computation OUTLINE Problem Denition Previous Partitioning Methods Previous Iterative Improvement Methods Probability Computation Results Conclusions 2

Problem Denition: Given a netlist G representing a VLSI circuit, its nodes into two sets meeting size constraints s.t. the partition Results in densely connected modules being physically clustered on chip. Many short wires and fewer long wires, thus minimizing the Problem Denition: Min-Cut Partitioning cost of wires between the two partitions (the cutset) is minimized To layout a VLSI circuit, recursively do min-cut partitioning into 2 until there are 2 nodes in each partition. Cut levels alter- halves, along X and Y dimensions nate 1 2 3 3 5 6 3 4 7 8 2 2 9 10 13 14 11 12 15 16 1 3 3 wire area 3

Iterative-improvement min-cut partitioning: Good, fast [Kernighan & Lin, Bell Syst. J., Feb. 1970] (KL), Examples, Our work has transformed this approach to fvery good, fastg Clustering followed by min-cut partitioning: Very good, slow [Wei & Cheng, ICCAD-89] (ratio-cut), [Hagen & Examples, Simulated annealing and Genetic algorithm: Very good, very slow Proc. of DAC, 1988], [Saab & Rao, Proc. Examples:[Sechen, Numerical optimization: Medium, slow [Mogaki et al., ICCAD, 1987]. More recently, [Riess, Examples: VLSI Partitioning/Placement Methodologies & Mattheyses, Proc. DAC, 1982] (FM), and [Krishnamurthy, [Fidducia IEEE Trans. Comput., May 1984] (LA). IEEE Trans. CAD, Sept. 1992] (EIG1), [Alpert and Kahng, ICCAD 94] (WINDOW), [Alpert and Yao, DAC 95] Kahng, (MELO). DAC, 1989], [Shahookar & Mazumder, IEEE Trans. CAD, 1990] and Johannes, DAC 94] (Paraboli), has put this approach in Doll good, slowg fvery In [Shahookar & Mazumder, ACM Comp. Surv., June 91]: min-cut partitioning is the most cost- \Iterative-improvement eective method" 4

Eu is the set of cutset nets connected only to u in V1 Iu is the set of nets connected to u that are not in the cutset The gain can be positive or negative Iterative-Improvement Algorithms: The KL/FM Algorithm The gain of a node u (say, in V1) is dened as X gain(u) := c(n i ) ; X c(n j ) n i 2Eu n j 2Iu Eu = {n1, n2} Iu = {n4} gain(u) = 2 1 = 1 n1 n2 u n3 n4 V 2 V 1 5

The KL/FM Algorithm (Contd.) 1. Generate an initial partition Pick best \unlocked" node among both subsets to move if the 2. condition (egs., 45-55%) is met. Otherwise, pick best balance unlocked node to move from the other subset 3. Tentatively move and lock the node 4. Update gains of the neighbors of swapped node 5. Repeat steps 2-4 until all nodes are locked Compute the prex sums Su's of gains of all nodes u in order of 6. Actually perform swaps till node x, s.t. Sx is the highest move. Gain of moved nodes Prefix Sum 2 0 1 2 3 2 1 5 2 4 7 2 2 2 1 4 2 1 6 4 0 7 Make actual moves till this point If Sx > 0 new partition = swapped partition repeat steps 2-6 7. new partition = old partition exit else 6

The ith element of the gain vector, 1 i k, gain(u)[i] = (# of in the cutset that are connected to i nodes in V1 including u) - nets Generally, best performance is obtained for k = 2 to 4. Memory is (p k max ). requirement Lookahead (LA) Algorithm [Krishnamurthy, IEEE Trans. The May 1984] Comput., Each node has a gain vector gain(u)[k] of node u with k elements is the degree of lookahead. Assume u 2 V1. k (# of nets in the cutset connected to u that have i;1 nodes in V2) gain(u)[1] = 2 1 = 1 n1 n2 u n3 gain(u)[2] = 2 n4 gain(u)[3] = 2 1 = 1 V 2 V 1 7

\Visual intuition", however, tells us that node 3 is the best one to followed by node 2, then node 1 move, FM and LA Comparison FM Gain Example LA Gain Example FM gain n1 2 1 n2 n9 1 4 1 5 1 6 1 7 n12 n13 n14 n15 LA gain (2,0,0) n1 1 n2 n9 4 5 6 7 n12 n13 n14 n15 n3 n4 2 2 n10 n5 n6 2 3 n11 n7 V 2 n8 V 1 1 n16 8 n17 9 1 10 1 11 1 FM gain n3 (2,0,1) 2 n4 n10 n5 n6 (2,0,1) 3 n11 n7 V 2 n8 V 1 8 9 10 11 n16 n17 LA is better than FM, but not good enough 8

Idea is to get an estimate of the potential gain of moving a node the current time. at Done by computing node gains according to the probabilities of connected nets from the cutset removing To obtain these net probabilities, we need probabilities of nodes actually moved a chicken-and-egg problem! being The PRObabilistic Partitioner (PROP) 0.5 0.6 0.8 0.5 V 2 0.7 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.7 9

Compute probabilistic gains gn i (u) corresponding to each net n i 3. to u, and then its total gain g(u) = P u2n connected gn i (u) i 4. Assign probabilities using f(g(u)) PROP: Determining Node Probabilities Either 1. Compute deterministic gains of nodes according to FM, and (a) a function f(g(u)) assign node probabilities using OR (b) Assign a xed probability of, say, 0.9, to each node 2. Iterate the following 2 steps (1 or more times): 10

PROP: Determining Node Probabilities (Contd.) 1, 0.2 n12 4 g(1), p(1) 1, 0.2 n13 5 2, 1 1, 0.2 n14 n1 6 1 n2 n9 1, 0.2 n15 7 1, 0.2 n16 n3 2, 1 8 2 n10 n4 n17 9 1, 0.2 n5 10 1, 0.8 n6 3 n11 2, 1 11 1, 0.8 n7 n8 V 2 V 1 (a) 1st Iteration.49, 0.3 n12 4 g(1), p(1).49, 0.3 n13 5.49, 0.3 2.0016, 1 n14 6 n1 1 n9.49, 0.3 n15 n2 7 0.3, 0.4 2.04, 1 n16 n3 8 2 n10 n17 n4 9 0.3, 0.4 n5 10 1.8, 0.9 n6 3 2.64, 1 n11 11 1.8, 0.9 n7 n8 V 2 V 1 (b) 2nd Iteartion 11

PROP (Contd.) The rest of the algorithm is as follows: Pick \unlocked" node with highest g(u) among both subsets to 1. if the balance condition is met. Otherwise, pick best un- move locked node to move from the other subset Tentatively move and lock the node. Note the \immediate move 2. gain" Update probabilities of nets connected to moved node, and the 3. of its neighbors gains 4. Repeat steps 1-3 until all nodes are locked Compute the prex sums Su's of gains of all nodes u in order of 5. Actually perform swaps till node x, s.t. Sx is the highest move. If Sx > 0 new partition = swapped partition repeat steps 1-5 6. new partition = old partition exit else 12

Let 2 V1 be connected to net n i, in the cutset u : i r = n i \ Vr, r = 1 2. n Node Gain Calculation: Nets In the 0.5 0.5 0.6 0.7 V 2 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.8 0.7 Gain gn i (u) is dened as: gn i (u) = (Probability of n i [1! 2] given that u has been moved) ;(Probability of n i [2! 1] given that u is not moved) Using conditional probabilities and some approximations gn i (u) c(n i )[ Qux2(n i 1 =fug) p(u x) ; Q uy2n i 2 p(uy)] Thus gn 1 (u) = 1 ; 0:25 = 0:75 gn 2 (u) = 1 ; 0:7 = 0:3 gn 3 (u) = ; 0:6 = 0:04. 0:64 13

Thus gn 4 (u) = ;(1 ; 0:14) = ;0:86 Node Gain Calculation: Net Not in 0.5 0.6 0.8 0.5 0.7 V 2 n1 n2 0.9 u V 1 n3 n4 0.2 0.8 0.7 In this case, gn i (u) is intutively negative: gn i (u) = ;c(n i )(Prob: that n i remains in the cutset after u is moved) Again using conditional probabilities and approximations gn i (u) ;c(n i )(1 ; Q ux2n i 1 =fug p(u x)) 14

PROP is Not Only a Tie-Breaking Extension of FM It is a completely new gain calculation method 0.1 0.1 0.1 0.1 FM 3 0.1 0.1 0.1 0.1 1 1.98 0.99 1 FM 2 FM gain FM 0 0.5 0.7 p(u) 0.8 FM 1 1.79 g(u) FM 0 0.5 0.7 15

Need a monotonically increasing function of node gains g(u)s A caveat that works well is applying thresholding [gup (say, = 1.5), low (say, = -1)] on node gains g Probabilities of all other nodes are computed using the probability Calculating Node Probabilities function Probabilities p max Semi Gaussian Linear p min Gains g up g low 16

Updating p nets and d neighbors per moved node takes time (p + total is (nd). d) Reinsertion of each neighbors in the balanced binary search tree log n time total updation time is (nd log n) takes Time and Space Complexities Initial probability and gain calculations: O(nd) Choosing the best node takes constant time thus total of (n) entire pass for Thus time complexity of PROP is (nd log n) Space complexity is (nd) (net and node incidence lists) 17

Case 50-50% % Impr. Previous Case 45-55% % Impr. Previous Summary of Results ACM/SIGDA suite: 801 nodes, 735 nets to 12637 nodes, 13419 nets Results: Algorithm of PROP Algorithm of PROP 30 FM20 22.3 FM100 57.1 EIG1 19.9 MELO 27.3 LA2-40 16.6 LA3-20 Paraboli 15 WINDOW 25.9 Timing Results: Previous Algorithm Speedup of PROP 0.98 FM100-bucket 2.9 FM100-tree 0.99 LA2-40 2.24 LA3-20 1.53 WINDOW 0.73 EIG1 2.17 MELO 3.90 Paraboli 18

Presented a new approach PROP to min-cut partitioning using a gain calculation probablity-based Achieves very good cutsets compared to previous iterative as well other state-of-the-art schemes as It is quite fast only twice as slow per run than FM-tree. It is much than other recent state-of-the-art schemes faster Iterative-improvement type schemes need not get caught in a local if move decisions are sophisticated (i.e., capture global minima We will adapt and extend PROP to achieve the following: k-way partitioning Timing optimization of circuits FPGA and multiple-fpga/chip mapping of large systems Conclusions and Future Work info) expensive statistical methods are not always needed Many variations of basic PROP possible 19