Supplementary Information
|
|
- Aubrey May
- 5 years ago
- Views:
Transcription
1 Boosting theoretical zeolitic framework generation for the prediction of new materials structures using GPU programming Laurent A. Baumes,* a Frederic Kruger, b Santiago Jimenez, a Pierre Collet, b and Avelino Corma a Supplementary Information Table S1. Flynn s taxonomy. See Fig.S1 for architectures diagrams. Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Figure S1. GPU architecture. Texture cache and Atomics are not integrated for clarity. Table S2. GeForce GTX295 description Model GeForce GTX 295 Year 2009 Average components size (nm) 55 Transistors (Million) 2x 1400 Die Size (mm 2 ) 2x 470 Number of Die 2 Bus interface PCIe x Memory min (MB) 2x 896 Core 576 Reference clock rate Shader 1242 (MHz) Memory 1998 Fillrate Pixel (GP/s) 2x Texture (GT/s) 2x Bandwidth Reference Memory (GB/s) 2x Configuration DRAM type GDDR3 Bus width (bit) 2x x14 Graphics library DirectX 10.0 support (version) OpenGL 3.2 GFLOPs (MADD+MUL)
2 Figure S3. Diagram comparing architectures where PU is a processing unit. Figure S4. Parallel Evolutionary loop. Evaluation is done in parallel on the GPU while the rest of the algorithm runs on the CPU.
3 Figure S5. Evolutionary algorithm minimizing the Weierstrass function in EASEA. The idea behind EASEA was to allow virtually any basic programmer to try out an evolutionary algorithm by just typing the code that was specific to the problem to be solved. The code for the implementation of the GPGPU algorithm that tries to minimise a benchmark function called Weierstrass is presented and does not contain much more than the following lines. This is how the genome of a newly created individual is initialised. EASEA provides a random function that returns a random value between the value of its two arguments, of the same type as its arguments, i.e. floats. The evaluator section contains a straightforward C-like implementation of the function to be optimised, that anyone with even basic programming skills in C should be able to write. This is the function that is sent to the GPGPU for parallel evaluation of the individuals (i.e. the genomes with the array of float containing the different values to be tested by the function). One must understand that once the code is being sent to the GPGPU, it is over there on its own, and must therefore be totally autonomous, as it will be cut off from the address space of the main program. Therefore, referring to global variables makes no sense, as well as using such things as printf, which will have nowhere to print to. In fact, function calls are not allowed in GPGPU programs. If however function calls are found in the code (call to the Abs function as above, for instance) the compiler will get the code of the function and inline it automatically. So this means that functions can be used in the code, but the programmer must keep in mind that the function will be inlined at compilation time, so these will not be true functions. Recursive functions are impossible to inline, so must be turned into iterative functions first, before they can be used on a GPGPU. A standard barycentric crossover is implemented. child, parent1 and parent2 are EASEA-defined pointers towards the selected parents to cross-over and child to create. In the mutator, tosscoin is a function provided by EASEA that returns a value of 1 with the probability of its argument (here pmutpergene). pmutpergene is a global variable that can be used in the mutator because the mutator is executed on the main CPU, not in the GPGPU (that does not know about any global variables of the evolutionary program). MAX is a macro function that is defined somewhere by the user, and X_MIN and X_MAX are global variables. Since the mutation function is fed by EASEA with a new child resulting from the above crossover, the Genome variable can be used directly. Finally, the program ends with a section containing default run parameters that allow to specify the evolutionary algorithm to be used: Number of generations, Probability to call mutation and crossover operators, Population size, The desired selection method to choose the parents for a crossover, The number of children (Offspring) per generation (100% of the population size), The number of parents that will compete with children in order to make it to the next generation (50% of the population size), How competing parents will be selected, How individuals from the (competing parents + offspring) temporary population will be selected, Whether elitism should be implemented or not, and Whether the fitness function should be maximised or minimised. The.ez file containing these sections gets compiled by typing: $ easea weierstrass.ez on the command line. cuda : will output code for any NVIDIA GPGPU card. When this option is used, the evaluation function will be sent on the GPGPU and run in parallel on the population to be evaluated. The rest of the algorithm that manages the population (selections, crossovers, mutations, reductions,... ) will stay on the host CPU and execute linarly. Speedup therefore only depends on the population size, the size of the genome and evaluation time.
4 Figure S5. AFX fitness landscape.
5 Fitness Function (Pseudo code) Constants definition: ANGLE_MIN ANGLE_MAX ANGLE_AVG_OPT ANGLE_AVG_OPT_MIN ANGLE_AVG_OPT_MAX DIST_MIN DIST_MAX DIST_OPT DIST_OPT_MIN DIST_OPT_MAX DIST_MIN_SQ DIST_MAX_SQ DIST_OPT_SQ DIST_OPT_MIN_SQ DIST_OPT_MAX_SQ Inputs: UnitCell uc; Unit Cell dimensions (a, b, c, α, β, γ) Atom[] auatoms; int[] aumultiplicity; Atom[] ucatoms; Atom[] nbatoms; Array containing the Asymmetric Unit Atoms coordinates. (T-Atoms) Symmetry multiplicity for each T-Atom. Array containing the Unit Cell Atoms coordinates.(symmetry Operations over T- Atoms). Array containing the Atoms in the Neighbour Cells.Only the atoms that could create links with ucatoms. Function GPU_GetFitness : //Local variables float distsq; // Distance square for two given atoms float dist; // Distance for two given atoms float aux; // auxiliar value to get the fitness float errors; // contains distance and nblink errors meassure float errors3mr; // contains the 3MR errors meassure float errorsangles; // contains a angle errors meassure int mult; // Current atau multiplicity Atom linkedatoms[]; // Array containing the atoms linked to atau float linkeddists[]; // Array containing the link distances to atau int nblinks; // Current number of link distances float angle; // Angle for three given atoms float avgangles; // Angles average for the current atau atom int nbangles; // Current number of angles formed with atau // Local variables initialization int NBLinkErrors = 4 * auatoms.count; float Fitness = (4 * ucatoms.count * DIST_OPT) + (6 * ucatoms.count * ANGLE_AVG_OPT); aux = 0.0f; errors = 0.0f; errors3mr = 0.0f; errorsangles = 0.0f; foreach(atom atau in auatoms) nblinks = 0; mult = aumultiplicity[atau]; //au x uc foreach(atom atuc in ucatoms) if(atau is not atuc) distsq = distancesq(uc, atau, atuc); if(distsq [DIST_MIN_SQ - DIST_MAX_SQ]) dist = sqrt(mydistsq); addatom(linkedatoms, atuc); adddist(linkeddists, dist); nblinks++; if(nblinks <= 4) if(dist [DIST_OPT_RANGE_MIN DIST_OPT_RANGE_MAX]) aux += (DIST_OPT * mult); aux += (DIST_OPT abs(dist_opt - dist)) * mult; NBLinkErrors--; errors += DIST_OPT * mult * (1.0f + abs(dist_opt - dist));
6 NBLinkErrors++; if(mydistsq < DIST_MIN_SQ) // too close distance errors += DIST_OPT_SQ * mult * (1.0f + abs(dist_min_sq - distsq)); // au x nb foreach(atom atnb in nbatoms) distsq = distancesq(uc, atau, atnb); if(distsq [DIST_MIN_SQ - DIST_MAX_SQ]) dist = sqrt(mydistsq); addatom(linkedatoms, atnb); adddist(linkeddists, dist); nblinks++; if(nblinks <= 4) if(dist [DIST_OPT_MIN - DIST_OPT_MAX]) aux += (DIST_OPT * mult); aux += (DIST_OPT - abs(dist_opt - dist)) * mult; NBLinkErrors--; // more than 4 links errors += DIST_OPT * mult * (1.0f + abs(dist_opt - dist)); NBLinkErrors++; if(distsq < DIST_MIN_SQ) // too close distance errors += DIST_OPT_SQ * mult * (1.0f + abs(dist_min_sq - distsq)); // Reset the number and average angles values for this atau nbangles = 0; avgangles = 0.0f; foreach(atom linkedatomj/j=0 in linkedatoms) for(atom linkedatomk/k=j+1 in linkedatoms) // 3MR distsq = distancesq(uc, linkedatomj, linkedatomk); if(distsq [DIST_MIN_SQ - DIST_MAX_SQ]) errors3mr += DIST_OPT_SQ * mult * 1.0f + (DIST_MAX_SQ - distsq)); //ANGLES mydist = sqrt(mydistsq); angle = getangle(atau, linkedatomj, linkedatomk); if(angle [ANGLE_MIN - ANGLE_MAX]) nbangles++; if(nbangles <= 6) avgangles += angle; // more than 6 angles errorsangles += ANGLE_AVG_OPT abs(angle_avg_opt - angle); errorsangles += ANGLE_AVG_OPT abs(angle_avg_opt - angle); // only takes the first 6 angles if(nbangles > 6) nbangles = 6; if(nbangles > 0) avgangles = avgangles / nbangles; if(avganges [ANGLE_AVG_OPT_MIN - ANGLE_AVG_OPT_MAX]) aux += ANGLE_AVG_OPT * nbangles * mult; aux += (ANGLE_AVG_OPT - abs(angle_avg_opt - avgangles))* nbangles * mult; Fitness -= aux; if(fitness >= 0) Fitness += errors + errors3mr + errorsangles; Else Fitness = abs(fitness - errors - errors3mr errorsangles); output [Fitness];
7 Description of files in Sup. Info. (compressed in Zip) Note that CIF files contain Oxygen and have been optimized using GULP CIF_DATA/UNIT_CELL_A_SPG_74: Subset of solutions for Unit Cell A defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Imma (74), in cif format, with 6 and 8 T-Atoms respectively. CIF_DATA/UNIT_CELL_B_SPG_74: Subset of solutions for Unit Cell B defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Imma (74), in cif format, with 6 and 8 T-Atoms respectively. CIF_DATA/UNIT_CELL_C_SPG_46: Subset of solutions for Unit Cell C defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Ima2 (46), in cif format, with 10, 12 and 14 T-Atoms respectively. GULP_DATA: Contains an example of the input and output gulp files for the solution# 21 in Unit Cell A. The contained subset of structures is the following (grey rows refer to the structures which are integrated in the manuscript): Table S1. Subset of solutions for unit cell A defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Imma (74) T_Atoms# Solution_# Energy Fitness Table S2. Subset of solutions for unit cell B defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Imma (74) T_Atoms# Solution_# Energy Fitness Table S3. Subset of solutions for Unit Cell C defined by dimensions a, b, c = , , , angles α, β, γ = 90, 90, 90, and space group Ima2 (46) T_Atoms# Solution_# Energy Fitness
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationDIFFERENTIAL. Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka
USE OF FOR Tomáš Oberhuber, Atsushi Suzuki, Jan Vacata, Vítězslav Žabka Faculty of Nuclear Sciences and Physical Engineering Czech Technical University in Prague Mini workshop on advanced numerical methods
More informationMotivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University
Part 1: General introduction Ch. Hoelbling Wuppertal University Lattice Practices 2011 Outline 1 Motivation 2 Hardware Overview History Present Capabilities 3 Programming model Past: OpenGL Present: CUDA
More informationComputer Science, UCL, London
Genetically Improved CUDA C++ Software W. B. Langdon Computer Science, UCL, London 26.4.2014 Genetically Improved CUDA C++ Software W. B. Langdon Centre for Research on Evolution, Search and Testing Computer
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationPresenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs
Presenting: Comparing the Power and Performance of Intel's SCC to State-of-the-Art CPUs and GPUs A paper comparing modern architectures Joakim Skarding Christian Chavez Motivation Continue scaling of performance
More informationMulti-Processors and GPU
Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationWhat is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationECE 571 Advanced Microprocessor-Based Design Lecture 20
ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi
More informationX. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1
X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores
More informationCSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationBy: Tomer Morad Based on: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym. NVIDIA TESLA: A UNIFIED GRAPHICS AND COMPUTING ARCHITECTURE In IEEE Micro 28(2), 2008 } } Erik Lindholm, John Nickolls,
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationAccelerating image registration on GPUs
Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining
More informationCS179 GPU Programming Introduction to CUDA. Lecture originally by Luke Durant and Tamas Szalay
Introduction to CUDA Lecture originally by Luke Durant and Tamas Szalay Today CUDA - Why CUDA? - Overview of CUDA architecture - Dense matrix multiplication with CUDA 2 Shader GPGPU - Before current generation,
More informationhigh performance medical reconstruction using stream programming paradigms
high performance medical reconstruction using stream programming paradigms This Paper describes the implementation and results of CT reconstruction using Filtered Back Projection on various stream programming
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationCUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav
CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture
More informationLecture 2: CUDA Programming
CS 515 Programming Language and Compilers I Lecture 2: CUDA Programming Zheng (Eddy) Zhang Rutgers University Fall 2017, 9/12/2017 Review: Programming in CUDA Let s look at a sequential program in C first:
More informationCONSOLE ARCHITECTURE
CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What
More informationImproving 3D Medical Image Registration CUDA Software with Genetic Programming
Improving 3D Medical Image Registration CUDA Software with Genetic Programming W. B. Langdon Centre for Research on Evolution, Search and Testing Computer Science, UCL, London GISMOE: Genetic Improvement
More informationLecture 1: Gentle Introduction to GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationGPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34
1 / 34 GPU Programming Lecture 2: CUDA C Basics Miaoqing Huang University of Arkansas 2 / 34 Outline Evolvements of NVIDIA GPU CUDA Basic Detailed Steps Device Memories and Data Transfer Kernel Functions
More informationGenetically Improved BarraCUDA
Genetically Improved BarraCUDA CREST Annual Research Review: Recent Results and Research Trends 15-16 th June 2015 W. B. Langdon Department of Computer Science 15.6.2015 Genetically Improved BarraCUDA
More informationAMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet. (ER93FLA/ER91FLA-xx)
AMD Embedded PCIe ADD-IN BOARD E6760/E6460 Datasheet (ER93FLA/ER91FLA-xx) CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. Memory Interface... 4 2.2. Acceleration Features... 4 2.3. Avivo Display
More informationParalization on GPU using CUDA An Introduction
Paralization on GPU using CUDA An Introduction Ehsan Nedaaee Oskoee 1 1 Department of Physics IASBS IPM Grid and HPC workshop IV, 2011 Outline 1 Introduction to GPU 2 Introduction to CUDA Graphics Processing
More informationECE 571 Advanced Microprocessor-Based Design Lecture 18
ECE 571 Advanced Microprocessor-Based Design Lecture 18 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 11 November 2014 Homework #4 comments Project/HW Reminder 1 Stuff from Last
More informationComparison of High-Speed Ray Casting on GPU
Comparison of High-Speed Ray Casting on GPU using CUDA and OpenGL November 8, 2008 NVIDIA 1,2, Andreas Weinlich 1, Holger Scherl 2, Markus Kowarschik 2 and Joachim Hornegger 1 1 Chair of Pattern Recognition
More informationGPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27
1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationTest on Wednesday! Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs)
Test on Wednesday! 50 minutes Closed notes, closed computer, closed everything Material covered since Monday, Feb 8 (no Linux, Git, C, MD, or compiling programs) Study notes and readings posted on course
More informationParallel Execution of Kahn Process Networks in the GPU
Parallel Execution of Kahn Process Networks in the GPU Keith J. Winstein keithw@mit.edu Abstract Modern video cards perform data-parallel operations extremely quickly, but there has been less work toward
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationIntroduction to Computing and Systems Architecture
Introduction to Computing and Systems Architecture 1. Computability A task is computable if a sequence of instructions can be described which, when followed, will complete such a task. This says little
More informationChapter 1. Introduction: Part I. Jens Saak Scientific Computing II 7/348
Chapter 1 Introduction: Part I Jens Saak Scientific Computing II 7/348 Why Parallel Computing? 1. Problem size exceeds desktop capabilities. Jens Saak Scientific Computing II 8/348 Why Parallel Computing?
More informationHighly Scalable Multi-Objective Test Suite Minimisation Using Graphics Card
Highly Scalable Multi-Objective Test Suite Minimisation Using Graphics Card Shin Yoo, Mark Harman CREST, University College London, UK Shmuel Ur University of Bristol, UK It is all good improving SBSE
More informationSpecific applications Video game, virtual reality, training programs, Human-robot-interaction, human-computerinteraction.
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Parallel Processing
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationGPU Performance Optimisation. Alan Gray EPCC The University of Edinburgh
GPU Performance Optimisation EPCC The University of Edinburgh Hardware NVIDIA accelerated system: Memory Memory GPU vs CPU: Theoretical Peak capabilities NVIDIA Fermi AMD Magny-Cours (6172) Cores 448 (1.15GHz)
More informationMANY-CORE COMPUTING. 7-Oct Ana Lucia Varbanescu, UvA. Original slides: Rob van Nieuwpoort, escience Center
MANY-CORE COMPUTING 7-Oct-2013 Ana Lucia Varbanescu, UvA Original slides: Rob van Nieuwpoort, escience Center Schedule 2 1. Introduction, performance metrics & analysis 2. Programming: basics (10-10-2013)
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationIntroduction to GPGPU and GPU-architectures
Introduction to GPGPU and GPU-architectures Henk Corporaal Gert-Jan van den Braak http://www.es.ele.tue.nl/ Contents 1. What is a GPU 2. Programming a GPU 3. GPU thread scheduling 4. GPU performance bottlenecks
More informationRISC Processors and Parallel Processing. Section and 3.3.6
RISC Processors and Parallel Processing Section 3.3.5 and 3.3.6 The Control Unit When a program is being executed it is actually the CPU receiving and executing a sequence of machine code instructions.
More informationArchitectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1
Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics
More informationCS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics
CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationSPEED is but one of the design criteria of a database. Parallelism in Database Operations. Single data Multiple data
1 Parallelism in Database Operations Kalle Kärkkäinen Abstract The developments in the memory and hard disk bandwidth latencies have made databases CPU bound. Recent studies have shown that this bottleneck
More informationImproving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine
Improving performances of an embedded RDBMS with a hybrid CPU/GPU processing engine Samuel Cremer 1,2, Michel Bagein 1, Saïd Mahmoudi 1, Pierre Manneback 1 1 UMONS, University of Mons Computer Science
More informationOriginal PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy
Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design
More informationProcessor Architectures
ECPE 170 Jeff Shafer University of the Pacific Processor Architectures 2 Schedule Exam 3 Tuesday, December 6 th Caches Virtual Memory Input / Output OperaKng Systems Compilers & Assemblers Processor Architecture
More informationCUDA. Schedule API. Language extensions. nvcc. Function type qualifiers (1) CUDA compiler to handle the standard C extensions.
Schedule CUDA Digging further into the programming manual Application Programming Interface (API) text only part, sorry Image utilities (simple CUDA examples) Performace considerations Matrix multiplication
More informationReal-Time Support for GPU. GPU Management Heechul Yun
Real-Time Support for GPU GPU Management Heechul Yun 1 This Week Topic: Real-Time Support for General Purpose Graphic Processing Unit (GPGPU) Today Background Challenges Real-Time GPU Management Frameworks
More informationScalable Multi Agent Simulation on the GPU. Avi Bleiweiss NVIDIA Corporation San Jose, 2009
Scalable Multi Agent Simulation on the GPU Avi Bleiweiss NVIDIA Corporation San Jose, 2009 Reasoning Explicit State machine, serial Implicit Compute intensive Fits SIMT well Collision avoidance Motivation
More informationCSE 599 I Accelerated Computing - Programming GPUS. Memory performance
CSE 599 I Accelerated Computing - Programming GPUS Memory performance GPU Teaching Kit Accelerated Computing Module 6.1 Memory Access Performance DRAM Bandwidth Objective To learn that memory bandwidth
More informationMassively Parallel Architectures
Massively Parallel Architectures A Take on Cell Processor and GPU programming Joel Falcou - LRI joel.falcou@lri.fr Bat. 490 - Bureau 104 20 janvier 2009 Motivation The CELL processor Harder,Better,Faster,Stronger
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationOutline Marquette University
COEN-4710 Computer Hardware Lecture 1 Computer Abstractions and Technology (Ch.1) Cristinel Ababei Department of Electrical and Computer Engineering Credits: Slides adapted primarily from presentations
More informationWaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST
WaveView System Requirement V6 Reference: WST-0125-01 www.wavestore.com Page 1 WaveView System Requirements V6 Copyright notice While every care has been taken to ensure the information contained within
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationCS 150 Digital Design
CS 150 Digital Design Lecture 26 Graphics Processors 2012-11-20 Professor Kris Pister today s lecture by John Lazzaro TAs: Ian Juch, Vincent Lee, Albert Magyar www-inst.eecs.berkeley.edu/~cs150/ Play Today:
More informationA Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware
A Data Parallel Approach to Genetic Programming Using Programmable Graphics Hardware Darren M. Chitty QinetiQ Malvern Malvern Technology Centre St Andrews Road, Malvern Worcestershire, UK WR14 3PS dmchitty@qinetiq.com
More informationComputing on GPUs. Prof. Dr. Uli Göhner. DYNAmore GmbH. Stuttgart, Germany
Computing on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Summary: The increasing power of GPUs has led to the intent to transfer computing load from CPUs to GPUs. A first example has been
More informationAutomatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology
Automatic FFT Kernel Generation for CUDA GPUs. Akira Nukada Tokyo Institute of Technology FFT (Fast Fourier Transform) FFT is a fast algorithm to compute DFT (Discrete Fourier Transform). twiddle factors
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationNVIDIA GT730 D3 1024MB VHDCI to 4 DVI-D PCIe ADD-IN BOARD. Datasheet. Advantech model number: AEGX-N0A4-V4LMS1
NVIDIA GT730 D3 1024MB VHDCI to 4 DVI-D PCIe ADD-IN BOARD Datasheet Advantech model number: AEGX-N0A4-V4LMS1 CONTENTS 1. Feature... 3 2. Functional Overview... 4 2.1. GPU Block diagram... 4 2.2. Key Features...
More informationReal-time Graphics 9. GPGPU
Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing
More informationHow to build a Megacore microprocessor. by Andreas Olofsson (MULTIPROG WORKSHOP 2017)
How to build a Megacore microprocessor by Andreas Olofsson (MULTIPROG WORKSHOP 2017) 1 Disclaimers 2 This presentation summarizes work done by Adapteva from 2008-2016. Statements and opinions are my own
More information2.11 Particle Systems
2.11 Particle Systems 320491: Advanced Graphics - Chapter 2 152 Particle Systems Lagrangian method not mesh-based set of particles to model time-dependent phenomena such as snow fire smoke 320491: Advanced
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationCUDA Performance Considerations (2 of 2)
Administrivia CUDA Performance Considerations (2 of 2) Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Friday 03/04, 11:59pm Assignment 4 due Presentation date change due via email Not bonus
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationMessage Passing Interface (MPI)
CS 220: Introduction to Parallel Computing Message Passing Interface (MPI) Lecture 13 Today s Schedule Parallel Computing Background Diving in: MPI The Jetson cluster 3/7/18 CS 220: Parallel Computing
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationPart IV. Review of hardware-trends for real-time ray tracing
Part IV Review of hardware-trends for real-time ray tracing Hardware Trends For Real-time Ray Tracing Philipp Slusallek Saarland University, Germany Large Model Visualization at Boeing CATIA Model of Boeing
More informationgpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques
gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department
More informationEvolutionary Computation. Chao Lan
Evolutionary Computation Chao Lan Outline Introduction Genetic Algorithm Evolutionary Strategy Genetic Programming Introduction Evolutionary strategy can jointly optimize multiple variables. - e.g., max
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More information