A Multithreaded Genetic Algorithm for Floorplanning
|
|
- Abraham Burns
- 5 years ago
- Views:
Transcription
1 A Multithreaded Genetic Algorithm for Floorplanning Jake Adriaens ECE 556 Fall 2004
2 Introduction I have chosen to implement the algorithm described in the paper, Distributed Genetic Algorithms for the Floorplan Design Problem by J.P. Cohoon, S.U. Hegde, W.N. Martin, and D.S. Richards. Standard genetic algorithms operate sequentially with all new solutions depending on previous solutions. This is ok for small problems but as the number of modules M in the floorplan increases the number of solutions increases faster than the rate of M! (it is between M! and M!*M*(M-1), this is a very steep curve and therefore is a good problem to solve in a distributed fashion. The algorithm I implement is intended to provide a divide and conquer method for solving the floorplanning problem. Multithreading Motivation In the paper the algorithm is presented as distributed across a network of computers. I have chosen to instead implement the algorithm as a multithreaded program. There are a number of reasons I have chosen to make the algorithm multithreaded instead of distributed across a network. The paper states all of the genetic instances are assumed to have a large shared memory to communicate through. In a network environment separate instances would have to communicate over the network to exchange data, this results in slow communication relative to computational speed. If the program where multi-process on a single machine instances would have to communicate through inter-process techniques, because processes do not share memory so this also would be slower than having a shared memory. In a multithreaded environment the threads have a shared memory so they are able to exchange data directly through it. Traditionally multithreading as a form of distributing a computationally intense algorithm has been avoided. This is because older architectures only support one thread in flight at a time. The usefulness of multithreading in an environment that supports only a single thread in flight at a time is for I/O bound programs. If part of a program has to stop and wait for something slow, like keyboard input, that thread of the program may go to sleep giving processor time to other threads of the program. If the program is bound by the computational speed of the CPU there is no sense in making this program multithreaded because no extra CPU time can be gained from it. Modern architectures support multiple threads in flight at once (hyper-threading). This means that while one computationally complex part of a program is executing another part may be executing in another arithmetic/logical core within the processor, or even on another processor itself (one that shares the same memory). As these architectures become more popular there is more opportunity to distribute CPU limited algorithms over multiple threads, instead of multiple processes or over a network. Algorithm Genetic The non-distributed genetic floorplanning algorithm is an important component of the distributed algorithm. The genetic algorithm consists of four parts: spawning, crossover, mutation and merging. The initial step, spawning, is generating the initial set of solutions you intend to work with. To spawn solutions I start with the initial floorplan: 12v3h4v5h6v To make the next member of the initial population I mutate it N times, where N is the size of the initial population desired. A mutation consists of performing an M1, M2, or M3 move on the floorplan. M1 is a swap of two adjacent operands, 2 and 3 for example. M2 is complementing 1
3 some chain of operators, h s or v s. And M3 is swapping an adjacent operator and operand. To make the next solution for the initial population I perform the N mutations on the previously generated solution, this is repeated N-1 times to generate the N desired initial solutions. The size of the population N is a parameter passed on the command line to the program at runtime. Crossover is done to generate new, possibly better, solutions. To perform crossover two solutions from the population are chosen randomly, then one of four functions are chosen to combine them into one new solution. The following are graphical illustrations of each of the crossover functions from the paper Distributed Genetic Algorithms for the Floorplan Design Problem, the * and + represent vertical and horizontal cuts: Crossover 1 Crossover 2 Crossover 3 Crossover 4 2
4 The amount of offspring to make from the crossover functions is passed as a percentage of the total population on the command line. Mutation, as described earlier, is one of the three move operations from the simulated annealing floorplanning algorithm. Mutation is only done on the offspring produced from crossover and is passed to the program on the command line as a percentage of the offspring that will be mutated. Mutation causes the solutions from stagnating too quickly at a local minimum by introducing new possible floorplans. The last step of the genetic algorithm is merging. This part of the algorithm decides which offspring to keep for the next round of the genetic algorithm. If an offspring has a smaller cost (area of the floorplan) than the population member with the worst cost, the population member is thrown out and the offspring replaces it. This is done for all the offspring. The genetic algorithm is performed as follows. First the initial solutions are spawned, then for a number of generations crossover, mutation and merging are performed. After all generations the best solution is chosen. The number of generations to run is specified on the command line at run-time. Here is the psuedo-code for the genetic floorplanning algorithm: Spawn initial population For G generations Produce offspring through crossover Mutate offspring Merge offspring into population Choose best solution Distributed Genetic The genetic algorithm is modified slightly to make it distributed. A number of instances of the genetic algorithm are spawned and run independently an in parallel for a number generations. After a set number of generations the separate instances stop and trade solutions with each other to introduce diversity into their populations and keep them from stagnating at local minima. They then repeat this process for a set number of epochs, which can be specified on the command line as well. After all epochs the best solution is chosen from all the instances of the genetic algorithm. To keep the separate instances from reaching the same local minimum only one crossover function is used per instance. So thread A uses crossover A mod 4. The following is a graphical model of the distributed genetic algorithm using two threads: Spawn Init Population Init population Run Genetic Run Genetic Trade Data Trade Data Choose Best 3
5 Code Highlights One particularly hard problem to solve in development was having the threads communicate. Each thread must give data once to every other thread and must receive data once from every other thread, in other words a handshaking problem. To make the problem even harder, only the main program thread knows who all the other threads are, the threads that actually need to communicate have no idea who the other threads are. I solved the problem by creating a buffer all the threads share and having the main program signal who s turn it is to write the buffer, having the writing thread signal when the buffer is ready to be read and having the reading threads increment a counter when they are done reading the buffer so the main thread can check when all the threads have finished reading and it is ok to signal the next writer: Main thread: Set the writer to thread 0 For N threads Signal it is ok to write Wait until all threads have read the buffer (except the writer) Signal reading is not ok Set the read counter to 0 Writer = writer + 1 Child threads: For N threads Wait until writing is ok If I am not the writer Wait until reading is ok Read the buffer Increment the read counter to signal I have completed reading Else Signal writing is not ok Write the buffer Signal reading is ok Results The results I have come up with are misleading as to the effectiveness of the algorithm. My results show the distributed genetic floorplanning algorithm performing quite slower than the simulated annealing floorplanning algorithm. There were two major reasons my algorithm didn t perform as well as expected. The first is that I was running it on a machine that only supported a single thread in flight at once, because I was running four threads in my tests the distributed algorithm had an almost 4x increase in run-times (not quite 4x because trading is done sequentially among the threads). The other major slowdown in my implementation of the algorithm was the floorplanning data structure. I reused my data structure from the simulated annealing floorplanner. This data structure was optimized for doing moves and undos and also used a large amount of memory (to make it faster), which wasn t an issue because the simulated annealing floorplanner only used two floorplan objects (one for the current solution and one for the best). The most common operations I do in the distributed algorithm are making new floorplans (the crossover stage), removing floorplans and copying floorplans (the merging and trading stages). Removing a floorplan involves an operating system call to dynamically deallocate memory and copying and creating floorplans involve dynamically allocating memory 4
6 with a corresponding request to the operating system. The distributed genetic floorplanner also requires a large number of floorplan objects, which ends up using a large amount of memory (4 million objects overflowed 1GB of ram). Here is a summary of run times and areas produced by both the multithreaded genetic and simulated annealing algorithms followed by a graph of the parameters required to generate the provide solutions in the multithreaded genetic algorithm: Number of Multithreaded Genetic Simulated Annealing Modules Area Run-time (sec) Area Run-time (sec) Area and Run-time versus Modules Modules Population Epochs Run-time (hundreds of secs) Population, Epochs and Run-time versus Modules The results I feel are an important measure of the algorithm is the number of solutions generated, the simulated annealing algorithm generates solutions when there are 200 modules while the distributed genetic algorithm looks at 205 solutions per thread for the same 200 modules. This means if the solution generation time of the distributed genetic algorithm was cut down to take 10x as long as one in the simulated annealing algorithm, in a machine that has four of the genetic threads in flight at once the run time would be about ¼ that of the simulated annealing algorithm. The code size of the multithreaded genetic program is 40.8KB with an executable of 50.2KB while the simulated annealing program has a code size of 20.1KB and an executable of 31.7KB. Conclusion Overall I am pleased with my implementation of the algorithm. I feel it would benefit greatly from a different floorplan data structure, with some restructuring the data structure could avoid the requirements of dynamically allocating and de-allocating memory, which takes a large amount of time. The data structure could also avoid storing the module sizes as well, since they are the same for each instance of the module in every floorplan, currently the size of a given 5
7 module is stored in every floorplan which is quite wasteful. If the data structure is rewritten the merging function could be rewritten as well, it is currently O(M 4 ) complexity, where M is the number of modules in the floorplan, unfortunately with the current data structure it is necessary to use this implementation. I think this algorithm will work quite well on hyper-threaded machines and is an efficient way to distribute the genetic algorithm and seems to be a natural extension of it. References Cohoon, J.P. Hegde, S.U. Martin,W.N. Richards, D.S. "Distributed Genetic Algorithms for the Floorplan Design Problem," IEE Trans. Computer-Aided Design, vol.10 No.4. pp April Rose, J.B. Snelgrove, W.M. Vranesie, Z.G. "Parallel standard cell placement algorithms with quality equivalent to simulated annealing," IEEE Trans. Computer-Aided Design, vol.7 No.3. pp Mar Sait, S.M. Youssef, H. VLSI Physical Design Automation: Theory and Practice, World Scientific Publishing Co. River Edge, NJ,
Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004
Genetic Placement: Genie Algorithm Way Sern Shong ECE556 Final Project Fall 2004 Introduction Overview One of the principle problems in VLSI chip design is the layout problem. The layout problem is complex
More informationSummary of Computer Architecture
Summary of Computer Architecture Summary CHAP 1: INTRODUCTION Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output
More informationA Genetic Algorithm for VLSI Floorplanning
A Genetic Algorithm for VLSI Floorplanning Christine L. Valenzuela (Mumford) 1 and Pearl Y. Wang 2 1 Cardiff School of Computer Science & Informatics, Cardiff University, UK. C.L.Mumford@cs.cardiff.ac.uk
More informationGenetic Algorithm for Circuit Partitioning
Genetic Algorithm for Circuit Partitioning ZOLTAN BARUCH, OCTAVIAN CREŢ, KALMAN PUSZTAI Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,
More informationCAD Algorithms. Placement and Floorplanning
CAD Algorithms Placement Mohammad Tehranipoor ECE Department 4 November 2008 1 Placement and Floorplanning Layout maps the structural representation of circuit into a physical representation Physical representation:
More informationPROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
More informationTwo Efficient Algorithms for VLSI Floorplanning. Chris Holmes Peter Sassone
Two Efficient Algorithms for VLSI Floorplanning Chris Holmes Peter Sassone ECE 8823A July 26, 2002 1 Table of Contents 1. Introduction 2. Traditional Annealing 3. Enhanced Annealing 4. Contiguous Placement
More informationMachine Architecture. or what s in the box? Lectures 2 & 3. Prof Leslie Smith. ITNP23 - Autumn 2014 Lectures 2&3, Slide 1
Machine Architecture Prof Leslie Smith or what s in the box? Lectures 2 & 3 ITNP23 - Autumn 2014 Lectures 2&3, Slide 1 Basic Machine Architecture In these lectures we aim to: understand the basic architecture
More informationGENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T. R.
GENETIC ALGORITHM BASED FPGA PLACEMENT ON GPU SUNDAR SRINIVASAN SENTHILKUMAR T R FPGA PLACEMENT PROBLEM Input A technology mapped netlist of Configurable Logic Blocks (CLB) realizing a given circuit Output
More informationSystem Call. Preview. System Call. System Call. System Call 9/7/2018
Preview Operating System Structure Monolithic Layered System Microkernel Virtual Machine Process Management Process Models Process Creation Process Termination Process State Process Implementation Operating
More informationThe CPU and Memory. How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram:
The CPU and Memory How does a computer work? How does a computer interact with data? How are instructions performed? Recall schematic diagram: 1 Registers A register is a permanent storage location within
More informationComputer System Overview OPERATING SYSTEM TOP-LEVEL COMPONENTS. Simplified view: Operating Systems. Slide 1. Slide /S2. Slide 2.
BASIC ELEMENTS Simplified view: Processor Slide 1 Computer System Overview Operating Systems Slide 3 Main Memory referred to as real memory or primary memory volatile modules 2004/S2 secondary memory devices
More informationSchool of Computer and Information Science
School of Computer and Information Science CIS Research Placement Report Multiple threads in floating-point sort operations Name: Quang Do Date: 8/6/2012 Supervisor: Grant Wigley Abstract Despite the vast
More informationSection 7: Wait/Exit, Address Translation
William Liu October 15, 2014 Contents 1 Wait and Exit 2 1.1 Thinking about what you need to do.............................. 2 1.2 Code................................................ 2 2 Vocabulary 4
More informationChapter 3 - Memory Management
Chapter 3 - Memory Management Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 3 - Memory Management 1 / 222 1 A Memory Abstraction: Address Spaces The Notion of an Address Space Swapping
More informationA Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2
Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications
More informationGenetic Algorithm for FPGA Placement
Genetic Algorithm for FPGA Placement Zoltan Baruch, Octavian Creţ, and Horia Giurgiu Computer Science Department, Technical University of Cluj-Napoca, 26, Bariţiu St., 3400 Cluj-Napoca, Romania {Zoltan.Baruch,
More informationA Simple Efficient Circuit Partitioning by Genetic Algorithm
272 A Simple Efficient Circuit Partitioning by Genetic Algorithm Akash deep 1, Baljit Singh 2, Arjan Singh 3, and Jatinder Singh 4 BBSB Engineering College, Fatehgarh Sahib-140407, Punjab, India Summary
More informationSelection Queries. to answer a selection query (ssn=10) needs to traverse a full path.
Hashing B+-tree is perfect, but... Selection Queries to answer a selection query (ssn=) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any
More informationDatabase Systems II. Secondary Storage
Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM
More informationCS 137 Part 8. Merge Sort, Quick Sort, Binary Search. November 20th, 2017
CS 137 Part 8 Merge Sort, Quick Sort, Binary Search November 20th, 2017 This Week We re going to see two more complicated sorting algorithms that will be our first introduction to O(n log n) sorting algorithms.
More informationUsing Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem
Proc. 1 st International Conference on Machine Learning and Data Engineering (icmlde2017) 20-22 Nov 2017, Sydney, Australia ISBN: 978-0-6480147-3-7 Using Genetic Algorithm with Triple Crossover to Solve
More informationIntroduction. A very important step in physical design cycle. It is the process of arranging a set of modules on the layout surface.
Placement Introduction A very important step in physical design cycle. A poor placement requires larger area. Also results in performance degradation. It is the process of arranging a set of modules on
More informationIntroduction VLSI PHYSICAL DESIGN AUTOMATION
VLSI PHYSICAL DESIGN AUTOMATION PROF. INDRANIL SENGUPTA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Introduction Main steps in VLSI physical design 1. Partitioning and Floorplanning l 2. Placement 3.
More informationComputer System Overview
Computer System Overview Operating Systems 2005/S2 1 What are the objectives of an Operating System? 2 What are the objectives of an Operating System? convenience & abstraction the OS should facilitate
More informationSimulated annealing/metropolis and genetic optimization
Simulated annealing/metropolis and genetic optimization Eugeniy E. Mikhailov The College of William & Mary Lecture 18 Eugeniy Mikhailov (W&M) Practical Computing Lecture 18 1 / 8 Nature s way to find a
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationConstructive floorplanning with a yield objective
Constructive floorplanning with a yield objective Rajnish Prasad and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 13 E-mail: rprasad,koren@ecs.umass.edu
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationUsing Genetic Algorithms to Solve the Box Stacking Problem
Using Genetic Algorithms to Solve the Box Stacking Problem Jenniffer Estrada, Kris Lee, Ryan Edgar October 7th, 2010 Abstract The box stacking or strip stacking problem is exceedingly difficult to solve
More informationRapid PHY Selection (RPS): Emulation and Experiments using PAUSE
Rapid PHY Selection (RPS): Emulation and Experiments using PAUSE Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 christen@cse.usf.edu (813) 974-4761
More informationRapid PHY Selection (RPS): Emulation and Experiments using PAUSE
Rapid PHY Selection (RPS): Emulation and Experiments using PAUSE Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 christen@cse.usf.edu (813) 974-4761
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationGenetic programming. Lecture Genetic Programming. LISP as a GP language. LISP structure. S-expressions
Genetic programming Lecture Genetic Programming CIS 412 Artificial Intelligence Umass, Dartmouth One of the central problems in computer science is how to make computers solve problems without being explicitly
More informationLocal Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:
Local Search Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat: Select a variable to change Select a new value for that variable Until a satisfying assignment is
More informationA Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture
A Simple Placement and Routing Algorithm for a Two-Dimensional Computational Origami Architecture Robert S. French April 5, 1989 Abstract Computational origami is a parallel-processing concept in which
More informationOptimizing Testing Performance With Data Validation Option
Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording
More informationShort-term Memory for Self-collecting Mutators. Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg
Short-term Memory for Self-collecting Mutators Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg CHESS Seminar, UC Berkeley, September 2010 Heap Management explicit heap
More informationThe Parallel Software Design Process. Parallel Software Design
Parallel Software Design The Parallel Software Design Process Deborah Stacey, Chair Dept. of Comp. & Info Sci., University of Guelph dastacey@uoguelph.ca Why Parallel? Why NOT Parallel? Why Talk about
More informationCaching Basics. Memory Hierarchies
Caching Basics CS448 1 Memory Hierarchies Takes advantage of locality of reference principle Most programs do not access all code and data uniformly, but repeat for certain data choices spatial nearby
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors
More informationCSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012
CSE2: Data Abstractions Lecture 7: B Trees James Fogarty Winter 20 The Dictionary (a.k.a. Map) ADT Data: Set of (key, value) pairs keys must be comparable insert(jfogarty,.) Operations: insert(key,value)
More informationDETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES
DETERMINING MAXIMUM/MINIMUM VALUES FOR TWO- DIMENTIONAL MATHMATICLE FUNCTIONS USING RANDOM CREOSSOVER TECHNIQUES SHIHADEH ALQRAINY. Department of Software Engineering, Albalqa Applied University. E-mail:
More informationKanban Scheduling System
Kanban Scheduling System Christian Colombo and John Abela Department of Artificial Intelligence, University of Malta Abstract. Nowadays manufacturing plants have adopted a demanddriven production control
More informationOptimizing Replication, Communication, and Capacity Allocation in CMPs
Optimizing Replication, Communication, and Capacity Allocation in CMPs Zeshan Chishti, Michael D Powell, and T. N. Vijaykumar School of ECE Purdue University Motivation CMP becoming increasingly important
More informationECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017
ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 The Operating System (OS) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke)
More informationNETW3005 Operating Systems Lecture 1: Introduction and history of O/Ss
NETW3005 Operating Systems Lecture 1: Introduction and history of O/Ss General The Computer Architecture section SFDV2005 is now complete, and today we begin on NETW3005 Operating Systems. Lecturers: Give
More informationA Hybrid Genetic Algorithms and Tabu Search for Solving an Irregular Shape Strip Packing Problem
A Hybrid Genetic Algorithms and Tabu Search for Solving an Irregular Shape Strip Packing Problem Kittipong Ekkachai 1 and Pradondet Nilagupta 2 ABSTRACT This paper presents a packing algorithm to solve
More informationHomework 2: Search and Optimization
Scott Chow ROB 537: Learning Based Control October 16, 2017 Homework 2: Search and Optimization 1 Introduction The Traveling Salesman Problem is a well-explored problem that has been shown to be NP-Complete.
More informationMetaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini
Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution
More informationConcurrency ECE2893. Lecture 12. ECE2893 Concurrency Spring / 16
Concurrency ECE2893 Lecture 12 ECE2893 Concurrency Spring 2011 1 / 16 Single Core Architectures 1 Recall that in the very beginning of the class we discussed the basic architecture of a modern computer,
More informationDesigning Computers. The Von Neumann Architecture. The Von Neumann Architecture. The Von Neumann Architecture
Chapter 5.1-5.2 Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! Von Neumann Architecture CMPUT101 Introduction to Computing (c) Yngvi Bjornsson
More informationThe Von Neumann Architecture. Designing Computers. The Von Neumann Architecture. CMPUT101 Introduction to Computing - Spring 2001
The Von Neumann Architecture Chapter 5.1-5.2 Von Neumann Architecture Designing Computers All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction
More informationIn examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured
System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected
More informationUsing implicit fitness functions for genetic algorithm-based agent scheduling
Using implicit fitness functions for genetic algorithm-based agent scheduling Sankaran Prashanth, Daniel Andresen Department of Computing and Information Sciences Kansas State University Manhattan, KS
More informationCMPUT101 Introduction to Computing - Summer 2002
7KH9RQ1HXPDQQ$UFKLWHFWXUH Chapter 5.1-5.2 Von Neumann Architecture 'HVLJQLQJ&RPSXWHUV All computers more or less based on the same basic design, the Von Neumann Architecture! CMPUT101 Introduction to Computing
More informationA GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS
A GENETIC ALGORITHM APPROACH TO OPTIMAL TOPOLOGICAL DESIGN OF ALL TERMINAL NETWORKS BERNA DENGIZ AND FULYA ALTIPARMAK Department of Industrial Engineering Gazi University, Ankara, TURKEY 06570 ALICE E.
More informationExternal Sorting Sorting Tables Larger Than Main Memory
External External Tables Larger Than Main Memory B + -trees for 7.1 External Challenges lurking behind a SQL query aggregation SELECT C.CUST_ID, C.NAME, SUM (O.TOTAL) AS REVENUE FROM CUSTOMERS AS C, ORDERS
More informationThe levels of a memory hierarchy. Main. Memory. 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms
The levels of a memory hierarchy CPU registers C A C H E Memory bus Main Memory I/O bus External memory 500 By 1MB 4GB 500GB 0.25 ns 1ns 20ns 5ms 1 1 Some useful definitions When the CPU finds a requested
More informationIncreasing Performance for PowerCenter Sessions that Use Partitions
Increasing Performance for PowerCenter Sessions that Use Partitions 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,
More informationPreview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread
Preview The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Implement thread in User s Mode Implement thread in Kernel s Mode CS 431 Operating System 1 The Thread Model
More informationA Linear-Time Heuristic for Improving Network Partitions
A Linear-Time Heuristic for Improving Network Partitions ECE 556 Project Report Josh Brauer Introduction The Fiduccia-Matteyses min-cut heuristic provides an efficient solution to the problem of separating
More informationClient vs. Enterprise SSDs
Client vs. Enterprise SSDs A Guide to Understanding Similarities and Differences in Performance and Use Cases Overview Client SSDs those designed primarily for personal computer storage can excel in some,
More informationIntroduction to I/O Efficient Algorithms (External Memory Model)
Introduction to I/O Efficient Algorithms (External Memory Model) Jeff M. Phillips August 30, 2013 Von Neumann Architecture Model: CPU and Memory Read, Write, Operations (+,,,...) constant time polynomially
More information(Refer Slide Time: 1:26)
Information Security-3 Prof. V Kamakoti Department of Computer science and Engineering Indian Institute of Technology Madras Basics of Unix and Network Administration Operating Systems Introduction Mod01,
More informationCOMPUTER SYSTEM. COMPUTER SYSTEM IB DP Computer science Standard Level ICS3U. COMPUTER SYSTEM IB DP Computer science Standard Level ICS3U
C A N A D I A N I N T E R N A T I O N A L S C H O O L O F H O N G K O N G 5.1 Introduction 5.2 Components of a Computer System Algorithm The Von Neumann architecture is based on the following three characteristics:
More informationChapter 5 - Input / Output
Chapter 5 - Input / Output Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 5 - Input / Output 1 / 90 1 Motivation 2 Principle of I/O Hardware I/O Devices Device Controllers Memory-Mapped
More informationProcesses The Process Model. Chapter 2 Processes and Threads. Process Termination. Process States (1) Process Hierarchies
Chapter 2 Processes and Threads Processes The Process Model 2.1 Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling Multiprogramming of four programs Conceptual
More informationLecture Notes on Garbage Collection
Lecture Notes on Garbage Collection 15-411: Compiler Design André Platzer Lecture 20 1 Introduction In the previous lectures we have considered a programming language C0 with pointers and memory and array
More informationBi-Objective Optimization for Scheduling in Heterogeneous Computing Systems
Bi-Objective Optimization for Scheduling in Heterogeneous Computing Systems Tony Maciejewski, Kyle Tarplee, Ryan Friese, and Howard Jay Siegel Department of Electrical and Computer Engineering Colorado
More informationA Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?
1 A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs? Bradley C. Kuszmaul MIT CSAIL, & Tokutek 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,,
More informationECE468 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively
More informationGenetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II
Genetic Algorithms Genetic Algorithms Iterative method for doing optimization Inspiration from biology General idea (see Pang or Wikipedia for more details): Create a collection of organisms/individuals
More informationCS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links
CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) UC Berkeley Fall 2010 Unit-Transaction Level
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017
Caches and Memory Hierarchy: Review UCSB CS24A, Fall 27 Motivation Most applications in a single processor runs at only - 2% of the processor peak Most of the single processor performance loss is in the
More informationAdaptive Crossover in Genetic Algorithms Using Statistics Mechanism
in Artificial Life VIII, Standish, Abbass, Bedau (eds)(mit Press) 2002. pp 182 185 1 Adaptive Crossover in Genetic Algorithms Using Statistics Mechanism Shengxiang Yang Department of Mathematics and Computer
More informationECE4680 Computer Organization and Architecture. Virtual Memory
ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it
More informationLet!s go back to a course goal... Let!s go back to a course goal... Question? Lecture 22 Introduction to Memory Hierarchies
1 Lecture 22 Introduction to Memory Hierarchies Let!s go back to a course goal... At the end of the semester, you should be able to......describe the fundamental components required in a single core of
More informationAdvanced Computer Architecture
ECE 563 Advanced Computer Architecture Fall 2009 Lecture 3: Memory Hierarchy Review: Caches 563 L03.1 Fall 2010 Since 1980, CPU has outpaced DRAM... Four-issue 2GHz superscalar accessing 100ns DRAM could
More informationPopularity of Twitter Accounts: PageRank on a Social Network
Popularity of Twitter Accounts: PageRank on a Social Network A.D-A December 8, 2017 1 Problem Statement Twitter is a social networking service, where users can create and interact with 140 character messages,
More information5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing
1. Introduction 2. Cutting and Packing Problems 3. Optimisation Techniques 4. Automated Packing Techniques 5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing 6.
More informationThe Limits of Sorting Divide-and-Conquer Comparison Sorts II
The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of
More informationOperating Systems Unit 6. Memory Management
Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation
More informationData Storage and Query Answering. Data Storage and Disk Structure (2)
Data Storage and Query Answering Data Storage and Disk Structure (2) Review: The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM @200MHz) 6,400
More informationProcess. Memory Management
Process Memory Management One or more threads of execution Resources required for execution Memory (RAM) Program code ( text ) Data (initialised, uninitialised, stack) Buffers held in the kernel on behalf
More informationProcess. One or more threads of execution Resources required for execution. Memory (RAM) Others
Memory Management 1 Process One or more threads of execution Resources required for execution Memory (RAM) Program code ( text ) Data (initialised, uninitialised, stack) Buffers held in the kernel on behalf
More informationPreview. Memory Management
Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationCache introduction. April 16, Howard Huang 1
Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently
More informationPCnet-FAST Buffer Performance White Paper
PCnet-FAST Buffer Performance White Paper The PCnet-FAST controller is designed with a flexible FIFO-SRAM buffer architecture to handle traffic in half-duplex and full-duplex 1-Mbps Ethernet networks.
More informationOutlook. Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium
Main Memory Outlook Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium 2 Backgound Background So far we considered how to share
More informationFAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017
Liang 1 Jintian Liang CS244B December 13, 2017 1 Introduction FAWN as a Service FAWN, an acronym for Fast Array of Wimpy Nodes, is a distributed cluster of inexpensive nodes designed to give users a view
More informationProcess. One or more threads of execution Resources required for execution. Memory (RAM) Others
Memory Management 1 Learning Outcomes Appreciate the need for memory management in operating systems, understand the limits of fixed memory allocation schemes. Understand fragmentation in dynamic memory
More informationScalable Ambient Effects
Scalable Ambient Effects Introduction Imagine playing a video game where the player guides a character through a marsh in the pitch black dead of night; the only guiding light is a swarm of fireflies that
More informationFile Size Distribution on UNIX Systems Then and Now
File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,
More informationUsing Genetic Algorithms to solve complex optimization problems. New Mexico. Supercomputing Challenge. Final Report. April 4, 2012.
Using Genetic Algorithms to solve complex optimization problems New Mexico Supercomputing Challenge Final Report April 4, 2012 Team 68 Los Alamos High School Team Members: Alexander Swart Teacher: Mr.
More informationCaches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016
Caches and Memory Hierarchy: Review UCSB CS240A, Winter 2016 1 Motivation Most applications in a single processor runs at only 10-20% of the processor peak Most of the single processor performance loss
More informationFile system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems
File system internals Tanenbaum, Chapter 4 COMP3231 Operating Systems Architecture of the OS storage stack Application File system: Hides physical location of data on the disk Exposes: directory hierarchy,
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More information