Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Similar documents
ESE535: Electronic Design Automation. Today. Question. Question. Intuition. Gate Array Evaluation Model

ESE535: Electronic Design Automation. Today. LUT Mapping. Simplifying Structure. Preclass: Cover in 4-LUT? Preclass: Cover in 4-LUT?

CAD Algorithms. Circuit Partitioning

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Unit 5A: Circuit Partitioning

Lesson 2 7 Graph Partitioning

Penalized Graph Partitioning for Static and Dynamic Load Balancing

CS137: Electronic Design Automation

Place and Route for FPGAs

Multilevel Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

ESE535: Electronic Design Automation. Today. EDA Use. Problem PLA. Programmable Logic Arrays (PLAs) Two-Level Logic Optimization

The Partitioning Problem

ESE535: Electronic Design Automation. Today. Example: Same Semantics. Task. Problem. Other Goals. Retiming. Day 23: April 13, 2011 Retiming

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Large Scale Circuit Partitioning

Mincut Placement with FM Partitioning featuring Terminal Propagation. Brett Wilson Lowe Dantzler

L14 - Placement and Routing

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Genetic Algorithm for Circuit Partitioning

VLSI Physical Design: From Graph Partitioning to Timing Closure

Lecture 19: Graph Partitioning

Eu = {n1, n2} n1 n2. u n3. Iu = {n4} gain(u) = 2 1 = 1 V 1 V 2. Cutset

Hypergraph Partitioning With Fixed Vertices

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Requirements of Load Balancing Algorithm

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

Multilevel k-way Hypergraph Partitioning

Placement Algorithm for FPGA Circuits

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Implementation of Multi-Way Partitioning Algorithm

CprE 583 Reconfigurable Computing

Structured Datapaths. Preclass 1. Throughput Yield. Preclass 1

MULTILEVEL OPTIMIZATION OF GRAPH BISECTION WITH PHEROMONES

Estimation of Wirelength

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

Lecture 3: Sorting 1

TELCOM2125: Network Science and Analysis

Local Search for CSPs

CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING

ESE534: Computer Organization. Tabula. Previously. Today. How often is reuse of the same operation applicable?

Mesh. Mesh Channels. Straight-forward Switching Requirements. Switch Delay. Total Switches. Total Switches. Total Switches?

An Interconnect-Centric Design Flow for Nanometer Technologies

Clustering Color/Intensity. Group together pixels of similar color/intensity.

Hardware-Software Codesign

High Level Synthesis

UNIVERSITY OF CALGARY. Force-Directed Partitioning Technique for 3D IC. Aysa Fakheri Tabrizi A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

state encoding with fewer bits has fewer equations to implement state encoding with more bits (e.g., one-hot) has simpler equations

Efficient FM Algorithm for VLSI Circuit Partitioning

A Linear-Time Heuristic for Improving Network Partitions

Technology Dependent Logic Optimization Prof. Kurt Keutzer EECS University of California Berkeley, CA Thanks to S. Devadas

(Lec 14) Placement & Partitioning: Part III

Graph and Hypergraph Partitioning for Parallel Computing

Copyright 2000, Kevin Wayne 1

VLSI Circuit Partitioning by Cluster-Removal Using Iterative Improvement Techniques

ESE532: System-on-a-Chip Architecture. Today. Wafer Cost. Message. Preclass 1. Implication. Chip Costs from Area Chip Area

MULTI-LEVEL GRAPH PARTITIONING

Randomized Algorithms

A Path Based Algorithm for Timing Driven. Logic Replication in FPGA

Native mesh ordering with Scotch 4.0

CS 267 Applications of Parallel Computers. Lecture 23: Load Balancing and Scheduling. James Demmel

CSE 373: Data Structures and Algorithms

Coloring 3-Colorable Graphs

System partitioning. System functionality is implemented on system components ASICs, processors, memories, buses

BACKEND DESIGN. Circuit Partitioning

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

1/28/2013. Synthesis. The Y-diagram Revisited. Structural Behavioral. More abstract designs Physical. CAD for VLSI 2

Reference. Wayne Wolf, FPGA-Based System Design Pearson Education, N Krishna Prakash,, Amrita School of Engineering

We can use a max-heap to sort data.

Clustering Algorithms for general similarity measures

ESE534: Computer Organization. Previously. Today. Computing Requirements (review) Instruction Control. Instruction Taxonomy

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

Sorting. Riley Porter. CSE373: Data Structures & Algorithms 1

CSE241 VLSI Digital Circuits UC San Diego

Introduction to Parallel Computing

Improved Algorithms for Hypergraph Bipartitioning

K-Ways Partitioning of Polyhedral Process Networks: a Multi-Level Approach

Key terms and concepts: Divide and conquer system partitioning floorplanning chip planning placement routing global routing detailed routing

Heuristic Graph Bisection with Less Restrictive Balance Constraints

Introduction to Computers and Programming. Today

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

A Recursive Coalescing Method for Bisecting Graphs

9 Penn ESE 535 Spring DeHon. Arch. Select Schedule RTL FSM assign Two-level, Multilevel opt. Covering Retiming

Problem Definition. Clustering nonlinearly separable data:

An Effective Algorithm for Multiway Hypergraph Partitioning

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1

Automated system partitioning based on hypergraphs for 3D stacked integrated circuits. FOSDEM 2018 Quentin Delhaye

ECE260B CSE241A Winter Logic Synthesis

ABC basics (compilation from different articles)

Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin

Today. EDA (CS286.5b) Covering Basics. Covering Review. Placement. GaMa - Linear Placement. Cover and Place

Iterative-Constructive Standard Cell Placer for High Speed and Low Power

Exact Combinatorial Algorithms for Graph Bisection

Transcription:

ESE535: Electronic Design Automation Preclass Warmup What cut size were you able to achieve? Day 4: January 28, 25 Partitioning (Intro, KLFM) 2 Partitioning why important Today Can be used as tool at many levels practical attack variations and issues Behavioral (C, MATLAB, ) Arch. Select Schedule RTL FSM assign Two-level, Multilevel opt. Covering Retiming Gate Netlist Placement Routing Layout Masks 3 Motivation () Cut size (bandwidth) can determine Area, energy Minimizing cuts minimize interconnect requirements increases signal locality Chip (board) partitioning minimize IO Direct basis for placement Particularly for our heterogeneous multicontext computing array Behavioral (C, MATLAB, ) RTL FSM assign Gate Netlist Layout Masks Arch. Select Schedule Two-level, Multilevel opt. Covering Retiming Placement Routing 4 Motivation (2) Divide-and-conquer trivial case: decomposition smaller problems easier to solve net win, if super linear Part(n) + 2 T(n/2) < T(n) problems with sparse connections or interactions Exploit structure limited cutsize is a common structural property random graphs would not have as small cuts 5 Bisection Width Partition design into two equal size halves Minimize wires (nets) with ends in both halves Number of wires crossing is bisection width lower bw = more locality N/2 N/2 cutsize 6

Interconnect Area Bisection width is lower-bound on IC width When wire dominated, may be tight bound (recursively) Classic Partitioning Problem Given: netlist of interconnect cells Partition into two (roughly) equal halves (A,B) minimize the number of nets shared by halves Roughly Equal balance condition: (.5-δ)N A (.5+δ)N 7 8 Balanced Partitioning KL FM Partitioning Heuristic NP-complete for general graphs [ND7: Minimum Cut into Bounded Sets, Garey and Johnson] Reduce SIMPLE MAX CUT Reduce MAXIMUM 2-SAT to SMC Unbalanced partitioning poly time Many heuristics/attacks Greedy, iterative pick cell that decreases cut and move it repeat small amount of non-greediness: look past moves that make locally worse randomization 9 Fiduccia-Mattheyses (Kernighan-Lin refinement) Start with two halves (random split?) Repeat until no updates Start with all cells free Repeat until no cells free Move cell with largest gain (balance allows) Update costs of neighbors Lock cell in place (record current cost) Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points Efficiency Tricks to make efficient: Expend little work picking move candidate Constant work O() Means amount of work not dependent on problem size Update costs on move cheaply [O()] Efficient data structure update costs cheap cheap to find next move 2 2

Ordering and Cheap Update Keep track of Net gain on node == delta net crossings to move a node cut cost after move = cost - gain Calculate node gain as Σ net gains for all nets at that node Each node involved in several nets Sort nodes by gain A Avoid full resort every move B C 3 FM Cell Gains Gain = Delta in number of nets crossing between partitions = Sum of net deltas for nets on the node -4 +4 2 4 After move node? Update cost Newcost=cost-gain Also need to update gains on all nets attached to moved node but moves are nodes, so push to all nodes affected by those nets Composability of Net Gains +- = + 5 6 FM Recompute Cell Gain For each net, keep track of number of cells in each partition [F(net), T(net)] if T(net)==, increment gain on F side of net (think => ) FM Recompute Cell Gain For each net, keep track of number of cells in each partition [F(net), T(net)] if T(net)==, increment gain on F side of net (think => ) if T(net)==, decrement gain on T side of net (think =>) 7 8 3

FM Recompute Cell Gain if T(net)==, increment gain on F side of net if T(net)==, decrement gain on T side of net decrement F(net), increment T(net) FM Recompute Cell Gain if T(net)==, increment gain on F side of net if T(net)==, decrement gain on T side of net decrement F(net), increment T(net) if F(net)==, increment gain on F cell 9 2 FM Recompute Cell Gain FM Recompute Cell Gain if T(net)==, increment gain on F side of net if T(net)==, decrement gain on T side of net decrement F(net), increment T(net) if F(net)==, increment gain on F cell if F(net)==, decrement gain on all cells (T) For each net, keep track of number of cells in each partition [F(net), T(net)] if T(net)==, increment gain on F side of net (think => ) if T(net)==, decrement gain on T side of net (think =>) decrement F(net), increment T(net) if F(net)==, increment gain on F cell if F(net)==, decrement gain on all cells (T) 2 22 FM Recompute (example) FM Recompute (example) + + + + 23 24 4

FM Recompute (example) FM Recompute (example) + + + + + + + + 25 26 FM Recompute (example) FM Recompute (example) + + + + + + + + + + 27 28 FM Data Structures Partition Counts A,B Two gain arrays One per partition Key: constant time cell update Cells successors (consumers) inputs locked status Allow partition of size 5 Binned by cost constant time update 29 3 5

Initial Partition Initial cut size? Identify Gains? 2 3 Initial Partition (cut 6) Move lists: Left: 2: A : E, G : B Right: 3: D, : H 3 : F : C 32 2 3 Initial Partition Move D Cut: 6-3 = 3 Update Gains 2-2 2 Initial Partition Move D Cut: 6-3 = 3 Update Gains 33 34-2 -2 Move lists: Left: : G : A : E -2: B Right: 2: : H : F : C 35-2 -2 Move G Cost: 3=2 Update Gains? 36 6

-2-2 -2 Move G Cost: 3=2 Update Gains? 37-2 -2-2 Move lists: Left: : A : E -2: B Right: 2: : H : C -2: F 38-2 -2 Move H Cost: 2= Update Gains? -2-2 Move H Cost: 2= Update Gains? -2-3 -2 39 4-2 -2 --3 Move lists: Left: : : A -2: B -3: E Right: 2: : C -2: F 4-2 -2 --3 42 7

-2-2 --3 Gain sequence: +3 + + 43 () KLFM Randomly partition into two halves Repeat until no updates Start with all cells free Repeat until no cells free Move cell with largest gain Update costs of neighbors Lock cell in place (record current cost) Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points 44 FM Optimization Sequence (ex) FM Running Time? 45 Randomly partition into two halves Repeat until no updates Start with all cells free Repeat until no cells free Move cell with largest gain Update costs of neighbors Lock cell in place (record current cost) Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points 46 FM Running Time Assume: constant number of passes to converge constant number of random starts N cell updates each round (swap) Updates K + fanout work (avg. fanout K) assume at most K inputs to each node For every net attached (K+) For every node attached to those nets (O(K)) Maintain ordered list O() per move every io move up/down by Running time: O(K 2 N) Algorithm significant for its speed (more than quality) 47 FM Starts? So, FM gives a not bad solution quickly 2K random starts, 3K network -- Alpert/Kahng 48 8

Weaknesses? Local, incremental moves only hard to move clusters no lookahead Stuck in local minima? Time Permit Looks only at local structure 49 5 Clustering Initial partitions Runs Improving FM Partition size freedom Following comparisons from Hauck and Boriello 96 5 Clustering Group together several leaf cells into cluster Run partition on clusters Uncluster (keep partitions) iteratively Run partition again using prior result as starting point instead of random start 52 Clustering Benefits Catch local connectivity which FM might miss moving one element at a time, hard to see move whole connected groups across partition Faster (smaller N) METIS -- fastest research partitioner exploits heavily 53 How Cluster? Random cheap, some benefits for speed Greedy connectivity examine in random order cluster to most highly connected 3% better cut, 6% faster than random Spectral (next week) look for clusters in placement (ratio-cut like) Brute-force connectivity (can be O(N 2 )) 54 9

Initial Partitions? Random Pick Random node for one side start imbalanced run FM from there Pick random node and Breadth-first search to fill one half Pick random node and Depth-first search to fill half Start with Spectral partition 55 Initial Partitions If run several times pure random tends to win out more freedom / variety of starts more variation from run to run others trapped in local minima 56 Number of Runs Number of Runs 2 - % - 8% 2 <2% 5 < 22% but? 57 2K random starts, 3K network Alpert/Kahng 58 Unbalanced Cuts Unbalanced Partitions Increasing slack in partitions may allow lower cut size 59 Small/large is benchmark size not small/large partition IO. Following comparisons from Hauck and Boriello 96 6

Partitioning Summary Decompose problem Find locality NP-complete problem Linear heuristic (KLFM) Many ways to tweak Hauck/Boriello, Karypis Today s Big Ideas: Divide-and-Conquer Exploit Structure Look for sparsity/locality of interaction Techniques: greedy incremental improvement randomness avoid bad cases, local minima incremental cost updates (time cost) efficient data structures 6 62 Admin Reading for Monday online Assignment 2 due on tomorrow Assignment 3 (4, 5, 6) out today 63