Parallel Simulated Annealing for VLSI Cell Placement Problem

Similar documents
Term Paper for EE 680 Computer Aided Design of Digital Systems I Timber Wolf Algorithm for Placement. Imran M. Rizvi John Antony K.

Introduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.

Estimation of Wirelength

Placement Algorithm for FPGA Circuits

Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems

Place and Route for FPGAs

AN EMPIRICAL STUDY OF THE STOCHASTIC EVOLUTION ALGORITHM FOR THE VLSI CELL PLACEMENT PROBLEM. Natrajan Thamizhmani

Three-Dimensional Cylindrical Model for Single-Row Dynamic Routing

Optimization Techniques for Design Space Exploration

CAD Algorithms. Placement and Floorplanning

COMPARATIVE STUDY OF CIRCUIT PARTITIONING ALGORITHMS

Evolutionary Computation Algorithms for Cryptanalysis: A Study

Constructive floorplanning with a yield objective

Genetic Algorithm for Circuit Partitioning

AN ACCELERATOR FOR FPGA PLACEMENT

EN2911X: Reconfigurable Computing Lecture 13: Design Flow: Physical Synthesis (5)

CHAPTER 6 ORTHOGONAL PARTICLE SWARM OPTIMIZATION

A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture

Designing Cellular Mobile Networks Using Non Deterministic Iterative Heuristics

An Introduction to FPGA Placement. Yonghong Xu Supervisor: Dr. Khalid

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

Algorithm Design (4) Metaheuristics

SIMULATED ANNEALING TECHNIQUES AND OVERVIEW. Daniel Kitchener Young Scholars Program Florida State University Tallahassee, Florida, USA

Solving Traveling Salesman Problem Using Parallel Genetic. Algorithm and Simulated Annealing

Algorithms & Complexity

Very Large Scale Integration (VLSI)

Basic Idea. The routing problem is typically solved using a twostep

Genetic Algorithm for FPGA Placement

Optimization of Process Plant Layout Using a Quadratic Assignment Problem Model

CAD Algorithms. Circuit Partitioning

Simulated Annealing for Placement Problem to minimise the wire length

CHAPTER 5 MAINTENANCE OPTIMIZATION OF WATER DISTRIBUTION SYSTEM: SIMULATED ANNEALING APPROACH

Non-deterministic Search techniques. Emma Hart

Comparison of TSP Algorithms

Crew Scheduling Problem: A Column Generation Approach Improved by a Genetic Algorithm. Santos and Mateus (2007)

ACO and other (meta)heuristics for CO

Sparse Matrices Reordering using Evolutionary Algorithms: A Seeded Approach

Journal of Universal Computer Science, vol. 14, no. 14 (2008), submitted: 30/9/07, accepted: 30/4/08, appeared: 28/7/08 J.

Efficiency and Quality of Solution of Parallel Simulated Annealing

Problem Formulation. Specialized algorithms are required for clock (and power nets) due to strict specifications for routing such nets.

Simulated Annealing. Premchand Akella

Solving Travelling Salesman Problem and Mapping to Solve Robot Motion Planning through Genetic Algorithm Principle

Hybrid Particle Swarm-Based-Simulated Annealing Optimization Techniques

Graph Coloring Algorithms for Assignment Problems in Radio Networks

Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004

TCG-Based Multi-Bend Bus Driven Floorplanning

Homework 2: Search and Optimization

A Parallel Simulated Annealing Algorithm for Weapon-Target Assignment Problem

A COMPARATIVE STUDY OF HEURISTIC OPTIMIZATION ALGORITHMS

Drexel University Electrical and Computer Engineering Department. ECEC 672 EDA for VLSI II. Statistical Static Timing Analysis Project

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

A Parallel Architecture for the Generalized Traveling Salesman Problem

Hardware/Software Codesign

Simulated Annealing. Slides based on lecture by Van Larhoven

A Randomized Algorithm for Minimizing User Disturbance Due to Changes in Cellular Technology

Train schedule diagram drawing algorithm considering interrelationship between labels

A complete algorithm to solve the graph-coloring problem

Animation of VLSI CAD Algorithms A Case Study

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout

Simulated Annealing. G5BAIM: Artificial Intelligence Methods. Graham Kendall. 15 Feb 09 1

Introduction VLSI PHYSICAL DESIGN AUTOMATION

Load Balancing of Parallel Simulated Annealing on a Temporally Heterogeneous Cluster of Workstations

GRASP. Greedy Randomized Adaptive. Search Procedure

ECE6095: CAD Algorithms. Optimization Techniques

Face Recognition Using Long Haar-like Filters

Simulated annealing routing and wavelength lower bound estimation on wavelength-division multiplexing optical multistage networks

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

Machine Learning for Software Engineering

Hardware-Software Codesign

A new Optimization Algorithm for the Design of Integrated Circuits

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem

Karthik Narayanan, Santosh Madiraju EEL Embedded Systems Seminar 1/41 1

Maximum Clique Problem

Partitioning Methods. Outline

CHAPTER 1 INTRODUCTION. equipment. Almost every digital appliance, like computer, camera, music player or

Floorplan and Power/Ground Network Co-Synthesis for Fast Design Convergence

PARALLEL GENETIC ALGORITHMS IMPLEMENTED ON TRANSPUTERS

Massively Parallel Approximation Algorithms for the Traveling Salesman Problem

Exploration vs. Exploitation in Differential Evolution

Volume 3, No. 6, June 2012 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at

An Enhanced Bloom Filter for Longest Prefix Matching

Silicon Virtual Prototyping: The New Cockpit for Nanometer Chip Design

Faster Placer for Island-style FPGAs

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

Four Methods for Maintenance Scheduling

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CAD Flow for FPGAs Introduction

Metaheuristic Algorithms for Hybrid Flow-Shop Scheduling Problem with Multiprocessor Tasks

Stochastic Optimization for Rigid Point Set Registration

BACKEND DESIGN. Circuit Partitioning

A Native Approach to Cell to Switch Assignment Using Firefly Algorithm

Parallel Genetic Algorithm to Solve Traveling Salesman Problem on MapReduce Framework using Hadoop Cluster

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

Increasing interconnection network connectivity for reducing operator complexity in asynchronous vision systems

342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH /$ IEEE

Modeling the Component Pickup and Placement Sequencing Problem with Nozzle Assignment in a Chip Mounting Machine

ICS 252 Introduction to Computer Design

Data Flow Graph Partitioning Schemes

Parallel Implementation of Travelling Salesman Problem using Ant Colony Optimization

Transcription:

Parallel Simulated Annealing for VLSI Cell Placement Problem Atanu Roy Karthik Ganesan Pillai Department Computer Science Montana State University Bozeman {atanu.roy, k.ganeshanpillai}@cs.montana.edu VLSI Cell Placement problem Abstract Simulated annealing is a general adaptive heuristic and belongs to the class non-deterministic algorithms. It has been applied to several combinatorial problems from various fields science and engineering. A parallel simulated annealing can give better results in comparison to sequential procedure. This report discusses how a parallel simulated annealing can be applied to very large-scale integration (VLSI) cell placement problem. Experimental results using different number parallel modified simulated annealing is able to obtain VLSI cell placement solutions that are better than sequential simulated annealing result. Index Terms Simulated Annealing, Parallel Simulated Annealing, VLSI Cell Placement Problem, I. INTRODUCTION VLSI cell placement problems are known to be NP complete. Trying to get an exact solution by evaluating every possible placement to determine the best one would take time proportional to the factorial the number modules. This method is, therefore, impossible to use for circuits with any reasonable number modules. A wide range heuristic algorithms exist in the literature for efficiently arranging the logic cells on VLSI chip. Simulated annealing is once such method. A strong feature simulated annealing method is that it is both effective and robust. Parallelization simulated annealing helps to get better results as it each parallel procedure can search effectively in different search space. Our results show that parallel simulated annealing provides betters cell placements in comparison to sequential simulated annealing. The main objective a placement algorithms are to minimize the total chip area and the total estimated wire length for all the cells. We need to optimize chip area usage in order to fit more functionality into a given chip area. We need to minimize wire length in order to reduce the capacitive delays associated with longer cells and speed up the operation the chip. The goals are closely related since minimizing the wire length will automatically reduce the total chip area. Hence this report focuses on minimizing the wire length the cell placements in VLSI chip board. The placement process is followed by routing, that is, determining the physical layout the interconnects through the available space. Wire length estimates are calculated by using Manhattan geometry that is, only horizontal and vertical lines are used connect any two points. A common approach is to define a cost function that consists wire length and various penalties for module overlap, total chip area, and so on. The goal the placement algorithm is to determine a placement with minimum possible cost. Simulated Annealing for VLSI Cell Placements This method is the most well-developed method available for module placement today. It is very time consuming but yields excellent results. It is an excellent heuristics for solving combinatorial optimization problems. This research is supported by the Department Computer Science, Montana State University Bozeman. This research is done as a part the class project for CS 545 Parallel Computing instructed by Dr. Year Back Yoo. The basic procedure in simulated annealing is to accept all moves that result in a reduction in cost. Moves that result in a cost increases are accepted with a probability that decreases with increase in cost. A parameter called T, called the temperature, is used to control the acceptance probability the cost Page 1

increasing moves. Higher values T cause more such moves to be accepted. Acceptance probability is given by exp(δc/t), where ΔC is the cost increase. In the beginning, the temperature is set to a very high value so most the moves are accepted. Then the temperature is gradually decreased so the cost increasing moves have less chance being accepted. Ultimately, the temperature is reduced to a very low value so that only moves causing a cost reduction are accepted, and the algorithm converges to a low cost configuration. Parallel Simulated Annealing for VLSI Cell Placements In this report parallel simulated annealing approach is used for the cell placement problems. The approach that we used is parallel moves, in which each process receives initial configuration the cells and each generates moves. If any one process finds a better solution than the current configuration then that solution is accepted as the best solution. All the other process gets the current best solution immediately and they continue to search for a better solution with the current configuration. Synchronization is maintained between each process by using a shared memory for the current configuration. II. RELATED WORKS Much research has gone into VLSI cell placement problem, which is considered to be NP complete. There are many heuristics which are present, that addresses this problem. But one the most elegant heuristic that applies to the VLSI cell placement problem is Simulated Annealing(SA). The concept SA was first implemented by [4]. It has been extended over the years by many researchers. [1] is a very bright example, where SA is applied to the VLSI cell placement problem. The inherent nature SA makes it a very good candidate for parallelization. Much research has gone into parallelize SA. [2, 3] have different approaches in parallelizing SA and applied it to VLSI chip design problem. Although we follow [2] in its approach towards parallelizing SA, but in our research we intend to prove that speed up parallel simulated annealing is proportional to the search space the problem, which none the referenced works have addressed. to a VLSI design, image processing, code design, facilitites, layout, network topology design, etc. The purpose our algorithm is to find a placement the standard-cells such that the total estimated interconnection cost is minimized. We divide our algorithm into 4 principal components Initial Configuration Move Generation function Cost Function Annealing Schedule Algorithm 1. Simulated Annealing 1. PROCEDURE Simulated_ Annealing; 2. initialize; 3. generate random configuration; 4. WHILE stopping. criterion (loop. count, temperature) = FALSE 5. WHILE inner.loop.criterion = FALSE 6. new_configuration perturb(configuration); 7. ΔC evaluate(new_con figuration, configuration); 8. IF ΔC <0 THEN new.configuration configuration 9. ELSE IF accept(δc, temperature) > random(o, 1) 10. THEN new_configuration configuration; 11. ENDIF 12. ENDIF 13. ENDWHILE 14. temperature schedule(loop_count, temperature); 15. loop_ count loop_ count + 1; 16. ENDWHILE 17. END. 1. Initial Configuration At first we decompose the circuit into individual cells and find out the input and output cells for each cell. The diagram and the table below illustrates how we decompose a circuit III. SIMULATED ANNEALING Simulated Annealing is one the most well developed and widely used iterative techniques for solving optimizing problems. Simulated annealing is a general heuristic and belongs to the class nondetermininstic algorithms. It has been applied to several combinatorial problems from various fields science and engineering. These problems include travelling salesman problem, graph partitioning, quadratic assignment, matching, linear arrangement, and scheduling. In the area engineering, simulated annealing has been applied Fig 1:- Sample Circuit Page 2

Cell ID In Cell Out Cells 1-3 2-3 3 2-3 2 - Table 1:- Cell Placement We start our annealing procedure by placing the cells on the chip randomly. We calculate the total area the circuit and place the cells accordingly so that they are placed at equal distances from each other. Since the cells are placed randomly thus, the distances between them and the length their interconnection will be huge. Next we will use the 3 different functions to get the optimal placement for the chip. II. Move Generation Function To generate a new possible cell placement, we use two strategies When a cell is swapped it may so happen that two cells overlap with each other. Let O indicate the overlap between two cells. Clearly this overlap is undesirable and should be minimized. In order to penalize the overlap severely we square the overlap so that we get larger penalties for overlaps. c 2 = Σ (O ) 2 ---- (iii) i!=j In equation (iii) c2 denotes the total overlap a chip. Thus when we generate a new move we calculate the cost function for the newly generated move. If we find that the new move has a cost lesser than the previous best move, we accept it as the best move. But if we find a solution that is not cost optimal, we do not reject it completely. We define an Accept function which is the probabilistic acceptance function It determines whether to accept a move or not. We have implemented an exponential function for the accept method. We are accepting a non cost optimal solution because we are giving the annealing schedule a chance to move out a local minimum which it may have hit. a) Move a single cell randomly to a new location on the chip. b) Swap the position two cells In our algorithm we use both the strategies randomly. 50% the move generation is done through random move (a) and and rest half the generation is done through swapping (50%). III. Cost Function The cost function in our algorithm is comprised two components c = c 1 + c 2. c1 is a measure the total estimated wire-length. For any cell, we find out the wire-length by calculating the horizontal and the vertical distance between it and its out cell. Let d h be the horizontal distance between cell i and its jth out cell and d v be the vertical distance between cell i and its jth out cell, therefore the total wire length for the chip can be derived by the following mathematical expression n n c1 =Σ Σ (d h + d v ) ---- (ii) i= cell 1 j = outcell where the summation is taken over all the cells in a circuit. Fig 2. :- A typical annealing schedule For example, if a certain annealing schedule hits point B (local minima) and if we do not accept a non cost optimal solution, then the annealing cannot reach the global minima. By using the accept function we are giving the annealing schedule a chance to get out the local minima. As a nature the accept function used by us, the probability accepting non cost optimal solution is higher at the beginning the annealing schedule. As temperature decreases, so does the probability accepting non-cost optimal solutions, since the perturbations a circuit is higher at higher temperatures than lower temperatures. III. Annealing Schedule At first we start the annealing procedure from a very high temperature 4E6. We reduced our temperature using Page 3

T new = (α T )T ------(iii) α, the cooling rate is fixed by us. Initially we rapidly decreased the temperature (α T 0.8). In the middle portion the annealing schedule we reduced the temperature slowly (α T 0.95), since this phase takes up the maximum ( 75%) the annealing schedule. In the low temperature, the temperature is decreased rapidly again(α T 0.8). The stopping condition is when the temperature falls below 1. the number processors increase, the speed up is not increased linearly. In multiple trial parallelism, the whole search space is parallelized. Each processor generates its own move and evaluates it in parallel. The processors are synchronized to concurrently search for an acceptable solution, which is illustrated in the figure below. Within each temperature range we experimentally set the number moves. Once the number moves is set, we fix it for the remainder the scheduling. For example in our research we have set the number iterations for the algorithm at 40, 16 and 8 for a single range temperature. The results which will be documented in the results section this research. IV. PARALLEL ANNEALING Since Simulated annealing can be used in a wide area research, thus accelerating simulated annealing has been an important area research. Several acceleration techniques have been proposed which can be broadly categorized into 3 general categories [5,6]. Design faster serial annealing o Faster cooling schedule [8] o Use clever move sets [9] Hardware acceleration [7] Employing parallelization in simulated annealing Understandably the most efficient the approaches is the third one, since it can take advantage the previous two. Moreover parallel computation fers the opportunity for improvement in the solution by dividing the search space into required number nodes. Parallelization helps us tackle problem which would otherwise been impractical to be solved by sequential computing. The parallelization simulated annealing can be generally in two general approaches [2]. move acceleration multiple-trial parallelism [5] first-best solution random thread selection In move acceleration, a trial is generated and the move is parallelized by distributing the tasks among the various processors. The single-processor approach is not optimal since if this approach does not support iso-efficiency, that is when Fig 3. :- Search space in a multiple-trial parallelism In multiple-trial solution, the acceptable solution is synchronized. When the processors return with acceptable solutions, the best solution is chosen and the previous best solution is replaced by the current solution. Thus by doing this, all the processors, take this solution and start to generate new moves and compares the generated solution, with this solution to compare this solution to evaluate the acceptability the newly generated solution. But the drawback with this solution is this will decrease the speed up the parallel solution due to the high over head on synchronization. In the first-best solution, which we implemented as the first generation our approach, we extend the concept multiple-trial solution. Since the major drawback from multiple-trial parallelism is that there is a high over head attached to it for synchronization. Thus we take the first acceptable solution from the threads which return with acceptable solutions. Although it reduces the overhead synchronization by considerable amount(this has the lowest synchronization overhead) but the quality solution it provides is not optimal. The random-thread solution is our second generation algorithm in our effort to parallelize Simulated Annealing. Since this algorithm is also an extension the multiple-trial solution, thus each process generates new moves and the master process synchronizes each process by accepting a random move from the acceptable moves returned by the processes. If the processes fails to return any acceptable move, the processes are made to iterate and bring in new acceptable solutions. In this algorithm to avoid the obvious deficiency first-best solution, instead accepting the first acceptable solution, we accept a random solution from all the acceptable solutions. Thus it reduces the synchronization overhead to a huge degree, and in the process provides us with solutions which are most the times better and at Page 4

comparable to the sequential algorithm. The algorithm has a linear speed-up when compared to its sequential counterpart. Algorithm 3 Random Thread Solution 1. PROCEDURE RANDOM_THREAD_SOLUTION 2. initialize; 3. generate random configuration; 4. PARBEGIN 5. WHILE stopping. criterion (loop. count, temperature) = FALSE 6. WHILE inner.loop.criterion = FALSE 7. new_configuration perturb(configuration); 8. SYNCHRONIZED 9. do 10. ΔC evaluate(new_con figuration, configuration); 11. IF ΔC <0 THEN new.configuration configuration 12. ELSE IF accept(δc, temperature) > random(o, 1) 13. THEN new_configuration configuration; 14. ENDIF 15. ENDIF 16. done 17. ENDWHILE 18. temperature schedule(loop_count, temperature); 19. loop_ count loop_ count + 1; 20. ENDWHILE 21. PAREND 22. END. V. EXPERIMENTAL RESULTS We ran our experiments in the cs-escher based in the Computer Science department Montana State University Bozeman. The server consists dual quad-core processors each having a processing speeds 2.93 GHz. It has a main memory 16GB. Three different numbers threads are compared with a fixed number iteration for sequential implementation. In all three methods we obtained linear speed up with results comparable to sequential simulated annealing. Below is a table with number threads, cell size, and number trials at each temperature and best cost results. The next set results is displayed here for 40 iterations the sequential trial. Thus the number iterations when the number thread is 2 is 20 and for 4 threads is 10 and so on. Cell Size Sequential Trials -40 Threads - 2 Threads 4 Threads - 8 170 39618 39599 39621 39625 85 36061 36046 36050 36099 42 2686 2714 2706 2690 8 144 131 132 146 Table 2:- Results for 40 iterations The results below (Table 3) display the results for the randomthread solution where the total number iterations used in each case is 16. The quality solution is compared to a sequential trial 16 iterations. Cell Size Sequential Trials -16 Threads 2 Threads 4 Threads - 8 170 39665 39683 39636 39692 85 36109 36098 36054 36104 42 2763 2717 2728 2717 8 140 122 150 152 VI. CONCLUSION: Table 3:- Results for 40 iterations By parallelization we were able to achieve almost linear speed up with better or a few time equivalent quality results when compared to sequential implementation. The overhead that we incurred are transmission overhead and thread synchronization overheads. VII. FUTURE WORK The natural extension to this research is the best-thread solution. We believe that by selecting the best solution among the outputs different parallel threads might improve the quality result. But this also means there will be a huge synchronization effort which will decrease the speed up the parallel implementation. Page 5

VIII. References [1] VLSI Cell Placement Techniques by K. Shahookar and P. Mazumdar, [2] S. M. Sait and H. Youssef, Iterative Computer Algorithms with Application in Engineering, IEEE Computer Society [3] R.M. King, P. Banerjee, ESP: Placement by Simulated Evolution, IEEE Transactions on Computer-Aided Design, 1989 [4] Optimization by Simulated Annealing by S. Kirkpatrick, C.D.Gelatt Jr., and M.P.Vecchi [5] E. Aarts, J. Korst. Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neutral Computing. [6] M.D. Huang, F. Romeo, and A.L. Sangiiovanni- Vincentelli. An efficient general cooling schedule for simulated annealing. IEEE International Conference on Computer-Aided Design, pages 381-384, 1986 [7] A. Iosupovici, C.King and M. Breuer. A module interchange placement machine. Proceedings the 20th Design Automation Conference, pages 171-174, 1983 [8] F. Catthoor, H. DeMaa and J. Vandewalle. Samurai : A general and efficient simulated-annealing schedule with fully adaptive annealing parametrs. Integration 6: 147-178 1988 [9] M. D. Durand. Accuracy vs. speed in placement. IEEE Design & Test Computers, pages 8-34, 1989 Page 6