AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT. Bobby R. Bruce
|
|
- Lucas Gibson
- 6 years ago
- Views:
Transcription
1 AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT Bobby R. Bruce
2 INSPIRATION Samsung Galaxy S7
3 INSPIRATION Mali-T880 MP12 Samsung Galaxy S7
4 INSPIRATION Mali-T880 MP12 Samsung Galaxy S7 Intel i7-2500k (overclocked to 5GHz)
5 INSPIRATION Mali-T880 MP12 Samsung Galaxy S7 Intel i7-2500k (overclocked to 5GHz)
6 INSPIRATION Mali-T880 MP12 Intel i7-2500k (overclocked to 5GHz) 70 GFLOPs Samsung Galaxy S7
7 INSPIRATION Mali-T880 MP GFLOPs Samsung Galaxy S7 Intel i7-2500k (overclocked to 5GHz) 70 GFLOPs
8 INSPIRATION nvidia GTX GFLOPs Mali-T880 MP GFLOPs Intel i7-2500k (overclocked to 5GHz) 70 GFLOPs
9 WHY DON T WE UTILISE THIS POWERFUL HARDWARE? Developers lack the skills Hardware specialisation Developers time is expensive; translating code to run on the GPU is expensive Getting decent optimisation requires manual trial and error
10 WHY DON T WE UTILISE THIS POWERFUL HARDWARE? An Automated approach would be ideal Developers lack the skills Hardware specialisation Developers time is expensive; translating code to run on the GPU is expensive Getting decent optimisation requires manual trial and error
11 BACKGROUND: WHAT S CURRENTLY AVAILABLE?
12 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons
13 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers Does not require any skills, or knowledge of, parallelisation Only targets very specific loops where dependencies are fully understood
14 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers CUDA/OpenCL Does not require any skills, or knowledge of, parallelisation When implemented well offers the best performance Only targets very specific loops where dependencies are fully understood Difficult to learn, harder to master. Very Manual
15 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers CUDA/OpenCL Directive-based Does not require any skills, or knowledge of, parallelisation When implemented well offers the best performance Considerably easier to implement. Only targets very specific loops where dependencies are fully understood Difficult to learn, harder to master. Still requires some skill, practise, and trial and error. Very Manual
16 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers CUDA/OpenCL Directive-based Does not require any skills, or knowledge of, parallelisation When implemented well offers the best performance Considerably easier to implement. Only targets very specific loops where dependencies are fully understood Difficult to learn, harder to master. Still requires some skill, practise, and trial and error. Very Manual
17 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers CUDA/OpenCL Directive-based Does not require any skills, or knowledge of, parallelisation When implemented well offers the best performance Considerably easier to implement. Only targets very specific loops where dependencies are fully understood Difficult to learn, harder to master. Still requires some skill, practise, and trial and error. Very Manual
18 BACKGROUND: WHAT S CURRENTLY AVAILABLE? Domain Pros Cons Automatic Parallelisation Compilers CUDA/OpenCL Directive-based Does not require any skills, or knowledge of, parallelisation When implemented well offers the best performance Considerably easier to implement. Only targets very specific loops where dependencies are fully understood Difficult to learn, harder to master. Still requires some skill, practise, and trial and error. Very Manual
19 BACKGROUND: OPENACC
20 BACKGROUND: OPENACC
21 BACKGROUND: OPENACC x20 Speed Up
22 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI
23 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch
24 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP
25 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP FITNESS FUNCTION Patch
26 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI GRAMMAR Creates Patch CFG-GP FITNESS FUNCTION Patch
27 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI OPENACC GRAMMAR GRAMMAR Creates Patch CFG-GP FITNESS FUNCTION Patch
28 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI OPENACC GRAMMAR PROGRAM DATA GRAMMAR Creates Patch CFG-GP FITNESS FUNCTION Patch
29 OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI OPENACC GRAMMAR PROGRAM DATA GRAMMAR LEXICAL ANALYSER SOURCE CODE Creates Patch CFG-GP FITNESS FUNCTION Patch
30 GRAMMAR <start> ::= <base> <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " " " <variables> ::= <variable> <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" "2" "3" "4" "5" "6" "7"
31 GRAMMAR <start> ::= <base> <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " " " <variables> ::= <variable> <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" "2" "3" "4" "5" "6" "7" <loop_line_number> ::=
32 GRAMMAR <start> ::= <base> <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " " " <variables> ::= <variable> <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" "2" "3" "4" "5" "6" "7" <loop_line_number> ::= #pragma acc loop private(1,2)
33 GRAMMAR <start> ::= <base> <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " " " <variables> ::= <variable> <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" "2" "3" "4" "5" "6" "7" <loop_line_number> ::=
34 GRAMMAR <start> ::= <base> <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " " " <variables> ::= <variable> <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" "2" "3" "4" "5" "6" "7" <loop_line_number> ::= - example1.c ,0 + #pragma acc loop private(x,y)
35 INITIAL INVESTIGATION Chose to run a very small example as a sanity check nvidia provide an n-body simulation example already containing OpenACC directives These directives were stripped for openacc to replicate Ran for 100 generations with population of 100
36 Execution Time (ms) RESULTS sequential original gi_best
37 Execution Time (ms) RESULTS original gi_best
38 RESULTS: OTHER NOTES Seems like much of the gain is due to random search We d like to be able to beat human-written alternatives This example is very small, future investigations will show how well the tool scales Elite Performance (ms) Generation
39 CURRENT/FUTURE WORK Currently applying the tool to larger At present can only work with C/C++, expanding code to work with FORTRAN Possible Improvements: Seed initial generation with basic solutions Introduce some clever profiling Get working with OpenMP as well as OpenACC
40 optimal 300 original Elite Performance (ms) sequential Execution Time (ms) Execution Time (ms) ANY QUESTIONS? 0 original optimal Generation
Introduction to OpenMP
Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs 2 1 What is OpenMP? OpenMP is an API designed for programming
More informationResearch Note. Towards automatic generation and insertion of OpenACC directives
UCL DEPARTMENT OF COMPUTER SCIENCE Research Note RN/18/04 Towards automatic generation and insertion of OpenACC directives April 12, 2018 Bobby R. Bruce Justyna Petke Abstract While the utilisation of
More informationComputer Science, UCL, London
Genetically Improved CUDA C++ Software W. B. Langdon Computer Science, UCL, London 26.4.2014 Genetically Improved CUDA C++ Software W. B. Langdon Centre for Research on Evolution, Search and Testing Computer
More informationOpenMP 4.0/4.5. Mark Bull, EPCC
OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all
More informationParsing in Parallel on Multiple Cores and GPUs
1/28 Parsing in Parallel on Multiple Cores and GPUs Mark Johnson Centre for Language Sciences and Department of Computing Macquarie University ALTA workshop December 2011 Why parse in parallel? 2/28 The
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationOpenACC. Part I. Ned Nedialkov. McMaster University Canada. October 2016
OpenACC. Part I Ned Nedialkov McMaster University Canada October 2016 Outline Introduction Execution model Memory model Compiling pgaccelinfo Example Speedups Profiling c 2016 Ned Nedialkov 2/23 Why accelerators
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationOpenACC 2.6 Proposed Features
OpenACC 2.6 Proposed Features OpenACC.org June, 2017 1 Introduction This document summarizes features and changes being proposed for the next version of the OpenACC Application Programming Interface, tentatively
More informationIntroduction to OpenMP. Lecture 2: OpenMP fundamentals
Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter June 7, 2012, IHPC 2012, Iowa City Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to parallelise an existing code 4. Advanced
More information1 of 6 Lecture 7: March 4. CISC 879 Software Support for Multicore Architectures Spring Lecture 7: March 4, 2008
1 of 6 Lecture 7: March 4 CISC 879 Software Support for Multicore Architectures Spring 2008 Lecture 7: March 4, 2008 Lecturer: Lori Pollock Scribe: Navreet Virk Open MP Programming Topics covered 1. Introduction
More informationLittle Motivation Outline Introduction OpenMP Architecture Working with OpenMP Future of OpenMP End. OpenMP. Amasis Brauch German University in Cairo
OpenMP Amasis Brauch German University in Cairo May 4, 2010 Simple Algorithm 1 void i n c r e m e n t e r ( short a r r a y ) 2 { 3 long i ; 4 5 for ( i = 0 ; i < 1000000; i ++) 6 { 7 a r r a y [ i ]++;
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationOpenMP 4.0. Mark Bull, EPCC
OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 9: Performance tuning Sources of overhead There are 6 main causes of poor performance in shared memory parallel programs: sequential code communication load imbalance synchronisation
More informationS Comparing OpenACC 2.5 and OpenMP 4.5
April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical
More informationLecture 2: Introduction to OpenMP with application to a simple PDE solver
Lecture 2: Introduction to OpenMP with application to a simple PDE solver Mike Giles Mathematical Institute Mike Giles Lecture 2: Introduction to OpenMP 1 / 24 Hardware and software Hardware: a processor
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationAdvanced OpenMP. Lecture 11: OpenMP 4.0
Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013 Starting to make an appearance in production compilers What s new in 4.0 User defined reductions Construct cancellation
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationParallelising Scientific Codes Using OpenMP. Wadud Miah Research Computing Group
Parallelising Scientific Codes Using OpenMP Wadud Miah Research Computing Group Software Performance Lifecycle Scientific Programming Early scientific codes were mainly sequential and were executed on
More informationOpenACC Fundamentals. Steve Abbott November 13, 2016
OpenACC Fundamentals Steve Abbott , November 13, 2016 Who Am I? 2005 B.S. Physics Beloit College 2007 M.S. Physics University of Florida 2015 Ph.D. Physics University of New Hampshire
More informationPORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune
PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further
More informationPragma-based GPU Programming and HMPP Workbench. Scott Grauer-Gray
Pragma-based GPU Programming and HMPP Workbench Scott Grauer-Gray Pragma-based GPU programming Write programs for GPU processing without (directly) using CUDA/OpenCL Place pragmas to drive processing on
More informationDirected Optimization On Stencil-based Computational Fluid Dynamics Application(s)
Directed Optimization On Stencil-based Computational Fluid Dynamics Application(s) Islam Harb 08/21/2015 Agenda Motivation Research Challenges Contributions & Approach Results Conclusion Future Work 2
More informationA Short Introduction to OpenMP. Mark Bull, EPCC, University of Edinburgh
A Short Introduction to OpenMP Mark Bull, EPCC, University of Edinburgh Overview Shared memory systems Basic Concepts in Threaded Programming Basics of OpenMP Parallel regions Parallel loops 2 Shared memory
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationHybrid Model Parallel Programs
Hybrid Model Parallel Programs Charlie Peck Intermediate Parallel Programming and Cluster Computing Workshop University of Oklahoma/OSCER, August, 2010 1 Well, How Did We Get Here? Almost all of the clusters
More informationINTRODUCTION TO COMPILER DIRECTIVES WITH OPENACC
INTRODUCTION TO COMPILER DIRECTIVES WITH OPENACC DR. CHRISTOPH ANGERER, NVIDIA *) THANKS TO JEFF LARKIN, NVIDIA, FOR THE SLIDES 3 APPROACHES TO GPU PROGRAMMING Applications Libraries Compiler Directives
More informationOpenMP: Open Multiprocessing
OpenMP: Open Multiprocessing Erik Schnetter May 20-22, 2013, IHPC 2013, Iowa City 2,500 BC: Military Invents Parallelism Outline 1. Basic concepts, hardware architectures 2. OpenMP Programming 3. How to
More informationHow GPUs can find your next hit: Accelerating virtual screening with OpenCL. Simon Krige
How GPUs can find your next hit: Accelerating virtual screening with OpenCL Simon Krige ACS 2013 Agenda > Background > About blazev10 > What is a GPU? > Heterogeneous computing > OpenCL: a framework for
More informationThe Role of Standards in Heterogeneous Programming
The Role of Standards in Heterogeneous Programming Multi-core Challenge Bristol UWE 45 York Place, Edinburgh EH1 3HP June 12th, 2013 Codeplay Software Ltd. Incorporated in 1999 Based in Edinburgh, Scotland
More informationOPENMP TIPS, TRICKS AND GOTCHAS
OPENMP TIPS, TRICKS AND GOTCHAS OpenMPCon 2015 2 Directives Mistyping the sentinel (e.g.!omp or #pragma opm ) typically raises no error message. Be careful! Extra nasty if it is e.g. #pragma opm atomic
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability 1 History of GPU
More informationNVIDIA s Compute Unified Device Architecture (CUDA)
NVIDIA s Compute Unified Device Architecture (CUDA) Mike Bailey mjb@cs.oregonstate.edu Reaching the Promised Land NVIDIA GPUs CUDA Knights Corner Speed Intel CPUs General Programmability History of GPU
More informationParallel Programming. Libraries and implementations
Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationCSE 4342/ 5343 Lab 1: Introduction
CSE 4342/ 5343 Lab 1: Introduction Objectives: Familiarize with the lab equipments. Login to the embedded lab network and locate shared files for the lab. Write a sample program using Intel compiler. Run
More informationINTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC. Jeff Larkin, NVIDIA Developer Technologies
INTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC Jeff Larkin, NVIDIA Developer Technologies AGENDA Accelerated Computing Basics What are Compiler Directives? Accelerating Applications with OpenACC Identifying
More informationA GPU based brute force de-dispersion algorithm for LOFAR
A GPU based brute force de-dispersion algorithm for LOFAR W. Armour, M. Giles, A. Karastergiou and C. Williams. University of Oxford. 8 th May 2012 1 GPUs Why use GPUs? Latest Kepler/Fermi based cards
More informationImproving 3D Medical Image Registration CUDA Software with Genetic Programming
Improving 3D Medical Image Registration CUDA Software with Genetic Programming W. B. Langdon Centre for Research on Evolution, Search and Testing Computer Science, UCL, London GISMOE: Genetic Improvement
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Multicore Programming Overview Shared memory systems Basic Concepts in OpenMP Brief history of OpenMP Compiling and running OpenMP programs 2 1 Shared memory systems OpenMP
More informationPerformance Analysis of the PSyKAl Approach for a NEMO-based Benchmark
Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United
More informationAutomatic Testing of OpenACC Applications
Automatic Testing of OpenACC Applications Khalid Ahmad School of Computing/University of Utah Michael Wolfe NVIDIA/PGI November 13 th, 2017 Why Test? When optimizing or porting Validate the optimization
More informationUsing OpenACC in IFS Physics Cloud Scheme (CLOUDSC) Sami Saarinen ECMWF Basic GPU Training Sept 16-17, 2015
Using OpenACC in IFS Physics Cloud Scheme (CLOUDSC) Sami Saarinen ECMWF Basic GPU Training Sept 16-17, 2015 Slide 1 Background Back in 2014 : Adaptation of IFS physics cloud scheme (CLOUDSC) to new architectures
More informationParallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008
Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared
More informationObjective. GPU Teaching Kit. OpenACC. To understand the OpenACC programming model. Introduction to OpenACC
GPU Teaching Kit Accelerated Computing OpenACC Introduction to OpenACC Objective To understand the OpenACC programming model basic concepts and pragma types simple examples 2 2 OpenACC The OpenACC Application
More informationMiwako TSUJI XcalableMP
Miwako TSUJI AICS 2014.10.24 2 XcalableMP 2010.09 2014.03 2013.10.25 AKIHABARA FP2C (Framework for Post-Petascale Computing) YML + XMP(-dev) + StarPU integrated developed in Japan and in France Experimental
More informationDawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs
DawnCC : a Source-to-Source Automatic Parallelizer of C and C++ Programs Breno Campos Ferreira Guimarães, Gleison Souza Diniz Mendonça, Fernando Magno Quintão Pereira 1 Departamento de Ciência da Computação
More informationCONTINUED EFFORTS IN ADAPTING THE GEOS- 5 AGCM TO ACCELERATORS: SUCCESSES AND CHALLENGES
CONTINUED EFFORTS IN ADAPTING THE GEOS- 5 AGCM TO ACCELERATORS: SUCCESSES AND CHALLENGES 9/20/2013 Matt Thompson matthew.thompson@nasa.gov Accelerator Conversion Aims Code rewrites will probably be necessary,
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationOPENMP TIPS, TRICKS AND GOTCHAS
OPENMP TIPS, TRICKS AND GOTCHAS Mark Bull EPCC, University of Edinburgh (and OpenMP ARB) markb@epcc.ed.ac.uk OpenMPCon 2015 OpenMPCon 2015 2 A bit of background I ve been teaching OpenMP for over 15 years
More informationAn Introduction to OpenAcc
An Introduction to OpenAcc ECS 158 Final Project Robert Gonzales Matthew Martin Nile Mittow Ryan Rasmuss Spring 2016 1 Introduction: What is OpenAcc? OpenAcc stands for Open Accelerators. Developed by
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationGrid-Based Genetic Algorithm Approach to Colour Image Segmentation
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation Marco Gallotta Keri Woods Supervised by Audrey Mbogho Image Segmentation Identifying and extracting distinct, homogeneous regions from
More informationOpenACC Fundamentals. Steve Abbott November 15, 2017
OpenACC Fundamentals Steve Abbott , November 15, 2017 AGENDA Data Regions Deep Copy 2 while ( err > tol && iter < iter_max ) { err=0.0; JACOBI ITERATION #pragma acc parallel loop reduction(max:err)
More informationParallel Systems I The GPU architecture. Jan Lemeire
Parallel Systems I The GPU architecture Jan Lemeire 2012-2013 Sequential program CPU pipeline Sequential pipelined execution Instruction-level parallelism (ILP): superscalar pipeline out-of-order execution
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationAMD Elite A-Series APU Desktop LAUNCHING JUNE 4 TH PLACE YOUR ORDERS TODAY!
AMD Elite A-Series APU Desktop LAUNCHING JUNE 4 TH PLACE YOUR ORDERS TODAY! INTRODUCING THE APU: DIFFERENT THAN CPUS A new AMD led category of processor APUs Are Their OWN Category + = Up to 779 GFLOPS
More informationParallel Computing. November 20, W.Homberg
Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better
More informationCOMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP
COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including
More informationAn Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs
An Execution Strategy and Optimized Runtime Support for Parallelizing Irregular Reductions on Modern GPUs Xin Huo, Vignesh T. Ravi, Wenjing Ma and Gagan Agrawal Department of Computer Science and Engineering
More informationThe Art of Parallel Processing
The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a
More informationComparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015
Comparing OpenACC 2.5 and OpenMP 4.1 James C Beyer PhD, Sept 29 th 2015 Abstract As both an OpenMP and OpenACC insider I will present my opinion of the current status of these two directive sets for programming
More informationOpenACC (Open Accelerators - Introduced in 2012)
OpenACC (Open Accelerators - Introduced in 2012) Open, portable standard for parallel computing (Cray, CAPS, Nvidia and PGI); introduced in 2012; GNU has an incomplete implementation. Uses directives in
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationParallelism III. MPI, Vectorization, OpenACC, OpenCL. John Cavazos,Tristan Vanderbruggen, and Will Killian
Parallelism III MPI, Vectorization, OpenACC, OpenCL John Cavazos,Tristan Vanderbruggen, and Will Killian Dept of Computer & Information Sciences University of Delaware 1 Lecture Overview Introduction MPI
More informationCACHE DIRECTIVE OPTIMIZATION IN THE OPENACC PROGRAMMING MODEL. Xiaonan (Daniel) Tian, Brent Leback, and Michael Wolfe PGI
CACHE DIRECTIVE OPTIMIZATION IN THE OPENACC PROGRAMMING MODEL Xiaonan (Daniel) Tian, Brent Leback, and Michael Wolfe PGI GPU ARCHITECTURE Threads Register Files Shared Memory L1 Cache Read-Only Data Cache
More informationDirective-based Programming for Highly-scalable Nodes
Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism
More informationParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1
ParaFormance TM : An Advanced Refactoring Tool for Parallelising C++ Programs Part 1 Chris Brown, Vladimir Janjic, Kenneth MacKenzie, Kevin Hammond University of St Andrews, Scotland @chrismarkbrown @rephrase_eu
More informationEvolving a CUDA Kernel from an nvidia Template
Evolving a CUDA Kernel from an nvidia Template W. B. Langdon CREST lab, Department of Computer Science 16a.7.2010 Introduction Using genetic programming to create C source code How? Why? Proof of concept:
More informationApplying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems
V Brazilian Symposium on Computing Systems Engineering Applying Multi-Core Model Checking to Hardware-Software Partitioning in Embedded Systems Alessandro Trindade, Hussama Ismail, and Lucas Cordeiro Foz
More informationHigh Performance Computing on GPUs using NVIDIA CUDA
High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and
More informationCompiling for GPUs. Adarsh Yoga Madhav Ramesh
Compiling for GPUs Adarsh Yoga Madhav Ramesh Agenda Introduction to GPUs Compute Unified Device Architecture (CUDA) Control Structure Optimization Technique for GPGPU Compiler Framework for Automatic Translation
More informationGPU ARCHITECTURE Chris Schultz, June 2017
GPU ARCHITECTURE Chris Schultz, June 2017 MISC All of the opinions expressed in this presentation are my own and do not reflect any held by NVIDIA 2 OUTLINE CPU versus GPU Why are they different? CUDA
More informationProfiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015
Profiling and Parallelizing with the OpenACC Toolkit OpenACC Course: Lecture 2 October 15, 2015 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15: Profiling and Parallelizing with the OpenACC Toolkit
More informationOpenMP 4.0 implementation in GCC. Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat
OpenMP 4.0 implementation in GCC Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat OpenMP 4.0 implementation in GCC Work started in April 2013, C/C++ support with host fallback only
More informationGPU GPU CPU. Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3
/CPU,a),2,2 2,2 Raymond Namyst 3 Samuel Thibault 3 Olivier Aumage 3 XMP XMP-dev CPU XMP-dev/StarPU XMP-dev XMP CPU StarPU CPU /CPU XMP-dev/StarPU N /CPU CPU. Graphics Processing Unit GP General-Purpose
More informationIntroduction to MPI. EAS 520 High Performance Scientific Computing. University of Massachusetts Dartmouth. Spring 2014
Introduction to MPI EAS 520 High Performance Scientific Computing University of Massachusetts Dartmouth Spring 2014 References This presentation is almost an exact copy of Dartmouth College's Introduction
More informationEquipment Purchase Advice. Mark Jones
Equipment Purchase Advice Mark Jones 22.09.2017 Question 1 What hardware do I need to get me through college? Questions Question 2 How much do I have to spend on this hardware? Question 3 Where should
More informationModern GPU Programming With CUDA and Thrust. Gilles Civario (ICHEC)
Modern GPU Programming With CUDA and Thrust Gilles Civario (ICHEC) gilles.civario@ichec.ie Let me introduce myself ID: Gilles Civario Gender: Male Age: 40 Affiliation: ICHEC Specialty: HPC Mission: CUDA
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationA General Discussion on! Parallelism!
Lecture 2! A General Discussion on! Parallelism! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Lecture 2: Overview Flynn s Taxonomy of
More informationOPENACC ONLINE COURSE 2018
OPENACC ONLINE COURSE 2018 Week 3 Loop Optimizations with OpenACC Jeff Larkin, Senior DevTech Software Engineer, NVIDIA ABOUT THIS COURSE 3 Part Introduction to OpenACC Week 1 Introduction to OpenACC Week
More informationOpenACC Accelerator Directives. May 3, 2013
OpenACC Accelerator Directives May 3, 2013 OpenACC is... An API Inspired by OpenMP Implemented by Cray, PGI, CAPS Includes functions to query device(s) Evolving Plan to integrate into OpenMP Support of
More informationReusing this material
XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationShared Memory Programming with OpenMP
Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model
More informationOverview of Project's Achievements
PalDMC Parallelised Data Mining Components Final Presentation ESRIN, 12/01/2012 Overview of Project's Achievements page 1 Project Outline Project's objectives design and implement performance optimised,
More informationThe Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer
The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer William Killian Tom Scogland, Adam Kunen John Cavazos Millersville University of Pennsylvania
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationProgramming Parallel Computers
ICS-E4020 Programming Parallel Computers Jukka Suomela Jaakko Lehtinen Samuli Laine Aalto University Spring 2015 users.ics.aalto.fi/suomela/ppc-2015/ Introduction Modern computers have high-performance
More informationWebinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director
Webinar Series: Triangulate your Storage Architecture with SvSAN Caching Luke Pruen Technical Services Director What can you expect from this webinar? To answer a simple question How can I create the perfect
More informationSMARTPHONE HARDWARE: ANATOMY OF A HANDSET. Mainak Chaudhuri Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver
SMARTPHONE HARDWARE: ANATOMY OF A HANDSET Mainak Chaudhuri Indian Institute of Technology Kanpur Commonwealth of Learning Vancouver Outline of topics What is the hardware architecture of a How does communication
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationIntroduction to OpenMP
Introduction to OpenMP Lecture 4: Work sharing directives Work sharing directives Directives which appear inside a parallel region and indicate how work should be shared out between threads Parallel do/for
More informationgpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques
gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department
More information