A Multithreaded Genetic Algorithm for Floorplanning

Size: px

Start display at page:

Download "A Multithreaded Genetic Algorithm for Floorplanning"

Abraham Burns
5 years ago
Views:

1 A Multithreaded Genetic Algorithm for Floorplanning Jake Adriaens ECE 556 Fall 2004

2 Introduction I have chosen to implement the algorithm described in the paper, Distributed Genetic Algorithms for the Floorplan Design Problem by J.P. Cohoon, S.U. Hegde, W.N. Martin, and D.S. Richards. Standard genetic algorithms operate sequentially with all new solutions depending on previous solutions. This is ok for small problems but as the number of modules M in the floorplan increases the number of solutions increases faster than the rate of M! (it is between M! and M!*M*(M-1), this is a very steep curve and therefore is a good problem to solve in a distributed fashion. The algorithm I implement is intended to provide a divide and conquer method for solving the floorplanning problem. Multithreading Motivation In the paper the algorithm is presented as distributed across a network of computers. I have chosen to instead implement the algorithm as a multithreaded program. There are a number of reasons I have chosen to make the algorithm multithreaded instead of distributed across a network. The paper states all of the genetic instances are assumed to have a large shared memory to communicate through. In a network environment separate instances would have to communicate over the network to exchange data, this results in slow communication relative to computational speed. If the program where multi-process on a single machine instances would have to communicate through inter-process techniques, because processes do not share memory so this also would be slower than having a shared memory. In a multithreaded environment the threads have a shared memory so they are able to exchange data directly through it. Traditionally multithreading as a form of distributing a computationally intense algorithm has been avoided. This is because older architectures only support one thread in flight at a time. The usefulness of multithreading in an environment that supports only a single thread in flight at a time is for I/O bound programs. If part of a program has to stop and wait for something slow, like keyboard input, that thread of the program may go to sleep giving processor time to other threads of the program. If the program is bound by the computational speed of the CPU there is no sense in making this program multithreaded because no extra CPU time can be gained from it. Modern architectures support multiple threads in flight at once (hyper-threading). This means that while one computationally complex part of a program is executing another part may be executing in another arithmetic/logical core within the processor, or even on another processor itself (one that shares the same memory). As these architectures become more popular there is more opportunity to distribute CPU limited algorithms over multiple threads, instead of multiple processes or over a network. Algorithm Genetic The non-distributed genetic floorplanning algorithm is an important component of the distributed algorithm. The genetic algorithm consists of four parts: spawning, crossover, mutation and merging. The initial step, spawning, is generating the initial set of solutions you intend to work with. To spawn solutions I start with the initial floorplan: 12v3h4v5h6v To make the next member of the initial population I mutate it N times, where N is the size of the initial population desired. A mutation consists of performing an M1, M2, or M3 move on the floorplan. M1 is a swap of two adjacent operands, 2 and 3 for example. M2 is complementing 1

3 some chain of operators, h s or v s. And M3 is swapping an adjacent operator and operand. To make the next solution for the initial population I perform the N mutations on the previously generated solution, this is repeated N-1 times to generate the N desired initial solutions. The size of the population N is a parameter passed on the command line to the program at runtime. Crossover is done to generate new, possibly better, solutions. To perform crossover two solutions from the population are chosen randomly, then one of four functions are chosen to combine them into one new solution. The following are graphical illustrations of each of the crossover functions from the paper Distributed Genetic Algorithms for the Floorplan Design Problem, the * and + represent vertical and horizontal cuts: Crossover 1 Crossover 2 Crossover 3 Crossover 4 2

4 The amount of offspring to make from the crossover functions is passed as a percentage of the total population on the command line. Mutation, as described earlier, is one of the three move operations from the simulated annealing floorplanning algorithm. Mutation is only done on the offspring produced from crossover and is passed to the program on the command line as a percentage of the offspring that will be mutated. Mutation causes the solutions from stagnating too quickly at a local minimum by introducing new possible floorplans. The last step of the genetic algorithm is merging. This part of the algorithm decides which offspring to keep for the next round of the genetic algorithm. If an offspring has a smaller cost (area of the floorplan) than the population member with the worst cost, the population member is thrown out and the offspring replaces it. This is done for all the offspring. The genetic algorithm is performed as follows. First the initial solutions are spawned, then for a number of generations crossover, mutation and merging are performed. After all generations the best solution is chosen. The number of generations to run is specified on the command line at run-time. Here is the psuedo-code for the genetic floorplanning algorithm: Spawn initial population For G generations Produce offspring through crossover Mutate offspring Merge offspring into population Choose best solution Distributed Genetic The genetic algorithm is modified slightly to make it distributed. A number of instances of the genetic algorithm are spawned and run independently an in parallel for a number generations. After a set number of generations the separate instances stop and trade solutions with each other to introduce diversity into their populations and keep them from stagnating at local minima. They then repeat this process for a set number of epochs, which can be specified on the command line as well. After all epochs the best solution is chosen from all the instances of the genetic algorithm. To keep the separate instances from reaching the same local minimum only one crossover function is used per instance. So thread A uses crossover A mod 4. The following is a graphical model of the distributed genetic algorithm using two threads: Spawn Init Population Init population Run Genetic Run Genetic Trade Data Trade Data Choose Best 3

5 Code Highlights One particularly hard problem to solve in development was having the threads communicate. Each thread must give data once to every other thread and must receive data once from every other thread, in other words a handshaking problem. To make the problem even harder, only the main program thread knows who all the other threads are, the threads that actually need to communicate have no idea who the other threads are. I solved the problem by creating a buffer all the threads share and having the main program signal who s turn it is to write the buffer, having the writing thread signal when the buffer is ready to be read and having the reading threads increment a counter when they are done reading the buffer so the main thread can check when all the threads have finished reading and it is ok to signal the next writer: Main thread: Set the writer to thread 0 For N threads Signal it is ok to write Wait until all threads have read the buffer (except the writer) Signal reading is not ok Set the read counter to 0 Writer = writer + 1 Child threads: For N threads Wait until writing is ok If I am not the writer Wait until reading is ok Read the buffer Increment the read counter to signal I have completed reading Else Signal writing is not ok Write the buffer Signal reading is ok Results The results I have come up with are misleading as to the effectiveness of the algorithm. My results show the distributed genetic floorplanning algorithm performing quite slower than the simulated annealing floorplanning algorithm. There were two major reasons my algorithm didn t perform as well as expected. The first is that I was running it on a machine that only supported a single thread in flight at once, because I was running four threads in my tests the distributed algorithm had an almost 4x increase in run-times (not quite 4x because trading is done sequentially among the threads). The other major slowdown in my implementation of the algorithm was the floorplanning data structure. I reused my data structure from the simulated annealing floorplanner. This data structure was optimized for doing moves and undos and also used a large amount of memory (to make it faster), which wasn t an issue because the simulated annealing floorplanner only used two floorplan objects (one for the current solution and one for the best). The most common operations I do in the distributed algorithm are making new floorplans (the crossover stage), removing floorplans and copying floorplans (the merging and trading stages). Removing a floorplan involves an operating system call to dynamically deallocate memory and copying and creating floorplans involve dynamically allocating memory 4

6 with a corresponding request to the operating system. The distributed genetic floorplanner also requires a large number of floorplan objects, which ends up using a large amount of memory (4 million objects overflowed 1GB of ram). Here is a summary of run times and areas produced by both the multithreaded genetic and simulated annealing algorithms followed by a graph of the parameters required to generate the provide solutions in the multithreaded genetic algorithm: Number of Multithreaded Genetic Simulated Annealing Modules Area Run-time (sec) Area Run-time (sec) Area and Run-time versus Modules Modules Population Epochs Run-time (hundreds of secs) Population, Epochs and Run-time versus Modules The results I feel are an important measure of the algorithm is the number of solutions generated, the simulated annealing algorithm generates solutions when there are 200 modules while the distributed genetic algorithm looks at 205 solutions per thread for the same 200 modules. This means if the solution generation time of the distributed genetic algorithm was cut down to take 10x as long as one in the simulated annealing algorithm, in a machine that has four of the genetic threads in flight at once the run time would be about ¼ that of the simulated annealing algorithm. The code size of the multithreaded genetic program is 40.8KB with an executable of 50.2KB while the simulated annealing program has a code size of 20.1KB and an executable of 31.7KB. Conclusion Overall I am pleased with my implementation of the algorithm. I feel it would benefit greatly from a different floorplan data structure, with some restructuring the data structure could avoid the requirements of dynamically allocating and de-allocating memory, which takes a large amount of time. The data structure could also avoid storing the module sizes as well, since they are the same for each instance of the module in every floorplan, currently the size of a given 5

7 module is stored in every floorplan which is quite wasteful. If the data structure is rewritten the merging function could be rewritten as well, it is currently O(M 4 ) complexity, where M is the number of modules in the floorplan, unfortunately with the current data structure it is necessary to use this implementation. I think this algorithm will work quite well on hyper-threaded machines and is an efficient way to distribute the genetic algorithm and seems to be a natural extension of it. References Cohoon, J.P. Hegde, S.U. Martin,W.N. Richards, D.S. "Distributed Genetic Algorithms for the Floorplan Design Problem," IEE Trans. Computer-Aided Design, vol.10 No.4. pp April Rose, J.B. Snelgrove, W.M. Vranesie, Z.G. "Parallel standard cell placement algorithms with quality equivalent to simulated annealing," IEEE Trans. Computer-Aided Design, vol.7 No.3. pp Mar Sait, S.M. Youssef, H. VLSI Physical Design Automation: Theory and Practice, World Scientific Publishing Co. River Edge, NJ,

Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004

Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004 Introduction Overview One of the principle problems in VLSI chip design is the layout problem. The layout problem is complex