A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre
|
|
- Corey Booth
- 5 years ago
- Views:
Transcription
1 A PSyclone perspec.ve of the big picture Rupert Ford STFC Hartree Centre
2 Requirement I Maintainable so,ware maintain codes in a way that subject ma7er experts can s:ll modify the code Leslie Hart from NOAA presen:ng at NCAR Mul:Core 2016 Want to use a single source code for all architectures Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Key aim: Single source science code 2
3 Requirement II High performance Efficient and scalable so,ware on current and future HPC architectures We want Performance on mul:ple architectures and maintainable code Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Key aim: Performance portability 3
4 What is the problem? I Complex and evolving science It is hard to restructure codes Codes So,ware has a long life (20 years +) so outlives hardware architectures So,ware Compilers complex and evolving Standards evolving (OpenMP, OpenACC, MPI, ) 4
5 What is the problem? II Hardware Mul:ple levels of parallelism inter-node (MPI), intra-node OpenMP/OpenACC,/, SIMD vectorisa:on, Oversubscrip:on Heterogeneity is coming Very different hardware solu:ons (many-core vs. GPU) Hardware solu:ons change rapidly Memory bandwidth is increasing but so is memory latency Memory direc:ves coming? 5
6 Where are we now? Current best prac:ce Code for hierarchies of parallelism (loops, blocks, par::on) Use standard direc:ves (OpenMP/OpenACC) Op:mise separately for many-core/gpu Try to minimise code differences In prac:ce Large legacy (MPI) code, difficult to modify OpenMP for oversubscrip:on only (or none) Few usable GPU implementa:ons HPC experts op:mise for the latest architecture 6
7 Con.nue as we are? 7
8 Is there an alterna.ve? Libraries General: MPI, NETCDF, BLAS Domain-specific : PIO, MCT, OASIS3, ESMF (infrastructure) Threading abstrac:on (MPI + X) OCCA (targets OpenMP, CUDA, OpenCL, OpenACC) Kokkos (C++, targets OpenMP, CUDA) Performance portable for the given parallelisa:on strategy 8
9 Op.mised code Complex parallel code + Complex parallel architectures + Complex compilers = A complex op:misa:on space Do we really expect there to be a single minimum? 9
10 Simple compiler example 10
11 Op.mised code Code changes to get good performance were invasive, with likely impacts to the CPU, MIC performance Mark Gove7 from NOAA presen:ng at NCAR Mul:Core 2016 on FV3 for GPU s Exact same code performing on all architectures is a pipe dream Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Single source op:mised code is not a7ainable 11
12 Is there a solu.on? Separa:on of concerns Separate science code from parallelisa:on and op:misa:on Single science source Targeted parallelisa:on and op:misa:on for portable performance Achievable using domain specific knowledge 12
13 Domain-specific knowledge I Finite element/volume/difference-specific Opera:ons over a mesh Typically same opera:on at each element/volume/point Data parallel (typically independent opera:ons) Nearest neighbour communica:ons for stencils Global reduc:on(s) for convergence and/or conserva:on 13
14 Domain-specific knowledge II Weather/climate-specific Fixed mesh topology Structured or semi-structured (quasi-uniform) mesh Ver:cal resolu:on << horizontal resolu:on 2D + 1D mesh (extrusion) Structured or unstructured data in horizontal and structured in ver:cal Dynamics mostly independent in ver:cal Physics mostly independent in horizontal Parallelise in horizontal 14
15 Exis.ng DSL s I Two main DSL s being used/developed by major centre s for use in this domain Stella/GridTools Designed to support FE/FV/FD ESM dynamical core stencils PSyclone/LFRic Designed to allow support for FE/FV/FD ESM models Also Firedrake A more general purpose DSL for FE s 15
16 Exis.ng DSL s II PSyclone and Firedrake add any required communica:on (halo communica:on and global reduc:ons) Stella? PSyclone will be inves:gated for use with Physics PSyclone designed to support mul:ple API s lfric and gocean (2d nemo-esque) Could PSyclone use Stella? Could PSyclone/Stella use OCCA/Kokkos? 16
17 DSL approach Logically global view at Algorithm level No itera:on over the mesh No reference to parallelism Opera:ons on full fields Unit of work at the Kernel level Mesh point/element or column Can be run in any order Domain specific compiler takes the Algorithm and Kernel specifica:ons and generates architecture-specific op:mised parallel code 17
18 DSL Languages I Both PSyclone and Stella are DSEL s PSyclone is embedded in Fortran Algorithm and Kernel code wri7en in Fortran Stella is embedded in C++ Algorithm and Kernel code wri7en in C++ 18
19 DSL Languages II PSyclone ra:onale for Fortran Exper:se: Scien:sts have exis:ng exper:se in Fortran Familiarity: Scien:sts (say they) want to con:nue with Fortran Development: unsupported features can be wri7en in Fortran Adop:on: Don t move too far as Scien:fic adop:on is key Legacy: It should be easier to integrate exis:ng Fortran code Performance: Fortran is a reasonable language for HPC keep code simple, portable, readable and similar to Fortran Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Ninja DSL! Fortran code is just a specifica:on Algorithm/Kernel separa:on allows for different languages 19
20 Coded kernels PSyclone kernels wri7en in Fortran so any science can be wri7en PSyclone supports built-in s where no kernel code is required Stella kernels wri7en in C++. Limited to stencil descrip:ons? Firedrake no coded kernels. Closer to FE descrip:on. However, trap door access to manually wri7en C kernels. Coded Fortran kernels requires a data model IJK, IKJ, KIJ, KL 20
21 DSL op.misa.ons I Delaying op:misa:ons 21
22 DSL op.misa.ons II Large op:misa:on space Can op:misa:ons be automated? Perhaps for reasonable performance Interes:ng to search the op:misa:on space Psyclone takes a user-specified op:misa:on approach to support the expert HPC expert provides an op:misa:on recipe Compile :me op:misa:on (sta:c analysis) 22
23 Maintenance benefits O,en overlooked Not only single source but Simplified science code Higher level problem specifica:on Lots of code generated NEMO loop bounds! 23
24 Have we missed anything? Func:onal parallelism Currently primarily limited to coupling Inves:gate finer grain (task hierarchy) Smaller units of work and compose these A finer grain coupling approach Suited to architecture heterogeneity (Jetson TX1) EuroExa EU project proposal, Peter Bauer, Graham Riley, Hartree Centre, Poten:al for flexible precision (fpga s) 24
25 Summary To cross the chasm we should aim for single source science code, higher level specifica:ons and performance portability DSEL s are a poten:al way forward Stella/GridTools and PSyclone/LFRic provide demonstrators Mostly complimentary work Perhaps PSyclone could use Stella? Poten:al for collabora:on with op:misa:ons? Convince other groups to try these out Threading abstrac:on libraries are interes:ng. These are complimentary to DSL s DSL s can use them. Func:onal parallelism and heterogeneous architectures are emerging issues Data layout is s:ll a poten:al issue 25
26 Thank you for listening
27 The Abyss What is PSyclone? (Wikipedia) When I made Psyclone, I was at the height of my alcoholism and addic:on, I was literally staring into the abyss" 27
28 Bring people with you 28
29 Hartree Centre Clients
GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationPerformance Analysis of the PSyKAl Approach for a NEMO-based Benchmark
Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United
More informationPSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology
PSyclone Separation of Concerns for HPC Codes Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone PSyclone developed by The Hartree Centre STFC Daresbury Laboratory, UK (since
More informationParallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.
Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic
More informationDesign Principles & Prac4ces
Design Principles & Prac4ces Robert France Robert B. France 1 Understanding complexity Accidental versus Essen4al complexity Essen%al complexity: Complexity that is inherent in the problem or the solu4on
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationCPU GPU. Regional Models. Global Models. Bigger Systems More Expensive Facili:es Bigger Power Bills Lower System Reliability
Xbox 360 Successes and Challenges using GPUs for Weather and Climate Models DOE Jaguar Mark GoveM Jacques Middlecoff, Tom Henderson, Jim Rosinski, Craig Tierney CPU Bigger Systems More Expensive Facili:es
More informationAn Introduction to the LFRic Project
An Introduction to the LFRic Project Mike Hobson Acknowledgements: LFRic Project Met Office: Sam Adams, Tommaso Benacchio, Matthew Hambley, Mike Hobson, Chris Maynard, Tom Melvin, Steve Mullerworth, Stephen
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationF.P. Brooks, No Silver Bullet: Essence and Accidents of Software Engineering CIS 422
The hardest single part of building a software system is deciding precisely what to build. No other part of the conceptual work is as difficult as establishing the detailed technical requirements...no
More informationFrom Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.
More informationCSE Opera,ng System Principles
CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationUnstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node
Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Keith Obenschain & Andrew Corrigan Laboratory for Computa;onal Physics and Fluid Dynamics Naval Research Laboratory Washington DC,
More informationSuper Instruction Architecture for Heterogeneous Systems. Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders
Super Instruction Architecture for Heterogeneous Systems Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders Super Instruc,on Architecture Mo,vated by Computa,onal Chemistry Coupled
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationPorting COSMO to Hybrid Architectures
Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationIntroduc)on to Xeon Phi
Introduc)on to Xeon Phi ACES Aus)n, TX Dec. 04 2013 Kent Milfeld, Luke Wilson, John McCalpin, Lars Koesterke TACC What is it? Co- processor PCI Express card Stripped down Linux opera)ng system Dense, simplified
More informationExperiences with CUDA & OpenACC from porting ACME to GPUs
Experiences with CUDA & OpenACC from porting ACME to GPUs Matthew Norman Irina Demeshko Jeffrey Larkin Aaron Vose Mark Taylor ORNL is managed by UT-Battelle for the US Department of Energy ORNL Sandia
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationLFRic: Developing the next generation atmospheric mode for the Metoffice
LFRic: Developing the next generation atmospheric mode for the Metoffice Dr Christopher Maynard, Met Office 27 September 2017 LFRic Acknowledgements S. Adams, M. Ashworth, T. Benacchio, R. Ford, M. Glover,
More informationPortable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 What is Cray Libsci_acc? Provide basic scientific
More informationOverview of research activities Toward portability of performance
Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into
More informationAlterna(ve Architectures
Alterna(ve Architectures COMS W4118 Prof. Kaustubh R. Joshi krj@cs.columbia.edu hep://www.cs.columbia.edu/~krj/os References: Opera(ng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright
More informationEhsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas
Ehsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas 2 Increasing number of transistors on chip Power and energy limited Single- thread performance limited => parallelism Many opeons: heavy mulecore,
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationOpenMP for next generation heterogeneous clusters
OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great
More informationComponent diagrams. Components Components are model elements that represent independent, interchangeable parts of a system.
Component diagrams Components Components are model elements that represent independent, interchangeable parts of a system. Components are more abstract than classes and can be considered to be stand- alone
More informationOpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2
2014@San Jose Shanghai Jiao Tong University Tokyo Institute of Technology OpenACC2 vs.openmp4 he Strong, the Weak, and the Missing to Develop Performance Portable Applica>ons on GPU and Xeon Phi James
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationAn Introduc+on to OpenACC Part II
An Introduc+on to OpenACC Part II Wei Feinstein HPC User Services@LSU LONI Parallel Programming Workshop 2015 Louisiana State University 4 th HPC Parallel Programming Workshop An Introduc+on to OpenACC-
More informationParallel Computing. November 20, W.Homberg
Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better
More informationParallel programming with Java Slides 1: Introduc:on. Michelle Ku=el August/September 2012 (lectures will be recorded)
Parallel programming with Java Slides 1: Introduc:on Michelle Ku=el August/September 2012 mku=el@cs.uct.ac.za (lectures will be recorded) Changing a major assump:on So far, most or all of your study of
More informationAn innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.
An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationCISC327 - So*ware Quality Assurance
CISC327 - So*ware Quality Assurance Lecture 8 Introduc
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationA formal design process, part 2
Principles of So3ware Construc9on: Objects, Design, and Concurrency Designing (sub-) systems A formal design process, part 2 Josh Bloch Charlie Garrod School of Computer Science 1 Administrivia Midterm
More informationDesign Pa*erns. + Anima/on Undo/Redo Graphics and Hints
Design Pa*erns + Anima/on Undo/Redo Graphics and Hints Design Pa*erns Design: the planning that lays the basis for the making of every object or system Pa*ern: a type of theme of recurring events or objects
More informationOPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER
OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER Rafal Jan Szarecki #JNCIE136 Solu9on Architect, Juniper Networks. AGENDA Route Reflector VNF - goals Route Reflector challenges and
More informationCISC327 - So*ware Quality Assurance
CISC327 - So*ware Quality Assurance Lecture 12 Black Box Tes?ng CISC327-2003 2017 J.R. Cordy, S. Grant, J.S. Bradbury, J. Dunfield Black Box Tes?ng Outline Last?me we con?nued with black box tes?ng and
More informationCISC327 - So*ware Quality Assurance
CISC327 - So*ware Quality Assurance Lecture 12 Black Box Tes?ng CISC327-2003 2017 J.R. Cordy, S. Grant, J.S. Bradbury, J. Dunfield Black Box Tes?ng Outline Last?me we con?nued with black box tes?ng and
More informationFaster Code for Free: Linear Algebra Libraries. Advanced Research Compu;ng 22 Feb 2017
Faster Code for Free: Linear Algebra Libraries Advanced Research Compu;ng 22 Feb 2017 Outline Introduc;on Implementa;ons Using them Use on ARC systems Hands on session Conclusions Introduc;on 3 BLAS Level
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationHigh-Level Synthesis Creating Custom Circuits from High-Level Code
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,
More informationSuccesses and Challenges Using GPUs for Weather and Climate Models
Successes and Challenges Using GPUs for Weather and Climate Models Mark Gove; Tom Henderson, Jacques Middlecoff, Jim Rosinski NOAA Earth System Research Laboratory GPU Programming Approaches Language Approach
More informationTowards Performance Portability in GungHo and GOcean
Towards Performance Portability in GungHo and GOcean M. Ashworth, R. Ford, M. Glover, D. Ham, M. Hobson, J. Holt, H. Liu, C. Maynard, T. Melvin, L. Mitchell, E. Mueller, S. Pickles, A. Porter, M. Rezny,
More informationParalleliza(on of the FV3 dycore for GPU and MIC processors. Mark Gove>, Jim Rosinski, Jacques Middlecoff, Yonggang Yu, Daniel Fiorino, Lynd Stringer
Paralleliza(on of the FV3 dycore for GPU and MIC processors Mark Gove>, Jim Rosinski, Jacques Middlecoff, Yonggang Yu, Daniel Fiorino, Lynd Stringer FV3 Timeline January 2016 Began analysis, op(miza(on
More informationCenter for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop
Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion
More informationParallel Programming Pa,erns
Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has
More informationParallel Systems. Project topics
Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a
More informationDistributed & Heterogeneous Programming in C++ for HPC at SC17
Distributed & Heterogeneous Programming in C++ for HPC at SC17 Michael Wong (Codeplay), Hal Finkel DHPCC++ 2018 1 The Panel 2 Ben Sanders (AMD, HCC, HiP, HSA) Carter Edwards (SNL, Kokkos, ISO C++) CJ Newburn
More informationSingle and mul,threaded processes
1 Single and mul,threaded processes Why threads? Express concurrency Web server (mul,ple requests), Browser (GUI + network I/O + rendering), most GUI programs for(;;) { struct request *req = get_request();
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationApplication Talk Ludwig: Performance portability
Application Talk Ludwig: Performance portability Kevin Stratford Alan Gray Kevin Stratford EPCC kevin@epcc.ed.ac.uk +44 131 650 5115 Background EXASCALE APPS. MANCHESTER 2016 2 Complex fluids Mixtures,
More informationChapter 3 Parallel Software
Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationParallel Applications on Distributed Memory Systems. Le Yan HPC User LSU
Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming
More informationPP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu
PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout
More informationTowards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.
Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing
More informationOpen Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,
Open Compute Stack (OpenCS) Overview D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, http://www.daetools.com/opencs What is OpenCS? A framework for: Platform-independent model specification 1.
More informationIntroduc)on to Xeon Phi
Introduc)on to Xeon Phi IXPUG 14 Lars Koesterke Acknowledgements Thanks/kudos to: Sponsor: National Science Foundation NSF Grant #OCI-1134872 Stampede Award, Enabling, Enhancing, and Extending Petascale
More informationCLAW FORTRAN Compiler source-to-source translation for performance portability
CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary
More informationmore uml: sequence & use case diagrams
more uml: sequence & use case diagrams uses of uml as a sketch: very selec)ve informal and dynamic forward engineering: describe some concept you need to implement reverse engineering: explain how some
More informationThe Era of Heterogeneous Computing
The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------
More informationAccelerating sequential computer vision algorithms using commodity parallel hardware
Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision
More informationGiven: m- element vectors A, B, C. Compute: i 1..m, A i = B i + α C i. In pictures:
Given: m- element vectors A, B, C Compute: i 1..m, A i B i α C i In pictures: A B C α 2 Given: m- element vectors A, B, C Compute: i 1..m, A i B i α C i In pictures, in parallel: A B C α 3 Given: m- element
More informationPerformance Tools for Technical Computing
Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology
More informationAdvanced Computation and I/O Methods for Earth-System Simulations Status update
Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther
More informationPerformance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework
Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework Erik Schne+er Perimeter Ins1tute for Theore1cal Physics XSCALE 2013, Boulder, CO, 2013-08-
More informationIntroduc)on to High Performance Compu)ng Advanced Research Computing
Introduc)on to High Performance Compu)ng Advanced Research Computing Outline What cons)tutes high performance compu)ng (HPC)? When to consider HPC resources What kind of problems are typically solved?
More informationCarlos Osuna. Meteoswiss.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna Meteoswiss carlos.osuna@meteoswiss.ch
More informationRDD and Strategy Pa.ern
RDD and Strategy Pa.ern CSCI 3132 Summer 2011 1 OO So1ware Design The tradi7onal view of objects is that they are data with methods. Smart Data. But, it is be.er to think of them as en##es that have responsibili#es.
More informationFrom Tadpoles to Supercomputers. Paul Fox, Mike Hull, Theo Marke9os, Simon Moore, Ma9 Naylor
From Tadpoles to Supercomputers Paul Fox, Mike Hull, Theo Marke9os, Simon Moore, Ma9 Naylor UKDF 2014 Mo
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationInforma(cs 231: What is Design? October 9, 2012
Informa(cs 231: What is Design? October 9, 2012 IDEO s Deep Dive Excellent example of the user- centered design process IDEO s Deep Dive Video Part 1 - hgp://www.youtube.com/watch?v=oon05q030qo Part 2
More informationAdapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs
Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches
More informationPhysis: An Implicitly Parallel Framework for Stencil Computa;ons
Physis: An Implicitly Parallel Framework for Stencil Computa;ons Naoya Maruyama RIKEN AICS (Formerly at Tokyo Tech) GTC12, May 2012 1 è Good performance with low programmer produc;vity Mul;- GPU Applica;on
More informationCisco Exam Dumps PDF for Guaranteed Success
Cisco 300 080 Exam Dumps PDF for Guaranteed Success The PDF version is simply a copy of a Portable Document of your Cisco 300 080 quesons and answers product. The Cisco Cerfied Network Professional Collaboraon
More informationPorting a parallel rotor wake simulation to GPGPU accelerators using OpenACC
DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)
More informationHigh-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration
High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationMulti/Many Core Programming Strategies
Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference
More informationHeterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM
Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM 25th March, GTC 2014, San Jose CA AnE- Pekka Hynninen ane.pekka.hynninen@nrel.gov NREL is a na*onal laboratory of the U.S. Department of Energy,
More informationIntroduction to Xeon Phi. Bill Barth January 11, 2013
Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationSoft GPGPUs for Embedded FPGAS: An Architectural Evaluation
Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts
More informationGetting DCIM Right the First or Second Time Around. PRESENTED BY Chris James CEO, DCIMPro
Getting DCIM Right the First or Second Time Around. PRESENTED BY Chris James CEO, DCIMPro Agenda: What are the Core Elements of DCIM? What is DCIM and why? The DCIM Maturation Model What is a Successful
More informationDynamical Core Rewrite
Dynamical Core Rewrite Tobias Gysi Oliver Fuhrer Carlos Osuna COSMO GM13, Sibiu Fundamental question How to write a model code which allows productive development by domain scientists runs efficiently
More informationCOSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan
COSC 310: So*ware Engineering Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 Admin A2 is up Don t forget to keep doing peer evalua>ons Deadline can be extended but shortens A3 >meframe Labs This
More informationHIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the
HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the concept. You *ARE* allowed to delete things you don t
More informationLLVM for the future of Supercomputing
LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationPORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune
PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further
More informationA performance portable implementation of HOMME via the Kokkos programming model
E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,
More information