A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre

Size: px
Start display at page:

Download "A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre"

Transcription

1 A PSyclone perspec.ve of the big picture Rupert Ford STFC Hartree Centre

2 Requirement I Maintainable so,ware maintain codes in a way that subject ma7er experts can s:ll modify the code Leslie Hart from NOAA presen:ng at NCAR Mul:Core 2016 Want to use a single source code for all architectures Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Key aim: Single source science code 2

3 Requirement II High performance Efficient and scalable so,ware on current and future HPC architectures We want Performance on mul:ple architectures and maintainable code Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Key aim: Performance portability 3

4 What is the problem? I Complex and evolving science It is hard to restructure codes Codes So,ware has a long life (20 years +) so outlives hardware architectures So,ware Compilers complex and evolving Standards evolving (OpenMP, OpenACC, MPI, ) 4

5 What is the problem? II Hardware Mul:ple levels of parallelism inter-node (MPI), intra-node OpenMP/OpenACC,/, SIMD vectorisa:on, Oversubscrip:on Heterogeneity is coming Very different hardware solu:ons (many-core vs. GPU) Hardware solu:ons change rapidly Memory bandwidth is increasing but so is memory latency Memory direc:ves coming? 5

6 Where are we now? Current best prac:ce Code for hierarchies of parallelism (loops, blocks, par::on) Use standard direc:ves (OpenMP/OpenACC) Op:mise separately for many-core/gpu Try to minimise code differences In prac:ce Large legacy (MPI) code, difficult to modify OpenMP for oversubscrip:on only (or none) Few usable GPU implementa:ons HPC experts op:mise for the latest architecture 6

7 Con.nue as we are? 7

8 Is there an alterna.ve? Libraries General: MPI, NETCDF, BLAS Domain-specific : PIO, MCT, OASIS3, ESMF (infrastructure) Threading abstrac:on (MPI + X) OCCA (targets OpenMP, CUDA, OpenCL, OpenACC) Kokkos (C++, targets OpenMP, CUDA) Performance portable for the given parallelisa:on strategy 8

9 Op.mised code Complex parallel code + Complex parallel architectures + Complex compilers = A complex op:misa:on space Do we really expect there to be a single minimum? 9

10 Simple compiler example 10

11 Op.mised code Code changes to get good performance were invasive, with likely impacts to the CPU, MIC performance Mark Gove7 from NOAA presen:ng at NCAR Mul:Core 2016 on FV3 for GPU s Exact same code performing on all architectures is a pipe dream Ma7hew Norman from ORNL presen:ng at NCAR Mul:Core 2015 Single source op:mised code is not a7ainable 11

12 Is there a solu.on? Separa:on of concerns Separate science code from parallelisa:on and op:misa:on Single science source Targeted parallelisa:on and op:misa:on for portable performance Achievable using domain specific knowledge 12

13 Domain-specific knowledge I Finite element/volume/difference-specific Opera:ons over a mesh Typically same opera:on at each element/volume/point Data parallel (typically independent opera:ons) Nearest neighbour communica:ons for stencils Global reduc:on(s) for convergence and/or conserva:on 13

14 Domain-specific knowledge II Weather/climate-specific Fixed mesh topology Structured or semi-structured (quasi-uniform) mesh Ver:cal resolu:on << horizontal resolu:on 2D + 1D mesh (extrusion) Structured or unstructured data in horizontal and structured in ver:cal Dynamics mostly independent in ver:cal Physics mostly independent in horizontal Parallelise in horizontal 14

15 Exis.ng DSL s I Two main DSL s being used/developed by major centre s for use in this domain Stella/GridTools Designed to support FE/FV/FD ESM dynamical core stencils PSyclone/LFRic Designed to allow support for FE/FV/FD ESM models Also Firedrake A more general purpose DSL for FE s 15

16 Exis.ng DSL s II PSyclone and Firedrake add any required communica:on (halo communica:on and global reduc:ons) Stella? PSyclone will be inves:gated for use with Physics PSyclone designed to support mul:ple API s lfric and gocean (2d nemo-esque) Could PSyclone use Stella? Could PSyclone/Stella use OCCA/Kokkos? 16

17 DSL approach Logically global view at Algorithm level No itera:on over the mesh No reference to parallelism Opera:ons on full fields Unit of work at the Kernel level Mesh point/element or column Can be run in any order Domain specific compiler takes the Algorithm and Kernel specifica:ons and generates architecture-specific op:mised parallel code 17

18 DSL Languages I Both PSyclone and Stella are DSEL s PSyclone is embedded in Fortran Algorithm and Kernel code wri7en in Fortran Stella is embedded in C++ Algorithm and Kernel code wri7en in C++ 18

19 DSL Languages II PSyclone ra:onale for Fortran Exper:se: Scien:sts have exis:ng exper:se in Fortran Familiarity: Scien:sts (say they) want to con:nue with Fortran Development: unsupported features can be wri7en in Fortran Adop:on: Don t move too far as Scien:fic adop:on is key Legacy: It should be easier to integrate exis:ng Fortran code Performance: Fortran is a reasonable language for HPC keep code simple, portable, readable and similar to Fortran Ulrich Schä7ler from DWD presen:ng at NCAR Mul:Core 2015 Ninja DSL! Fortran code is just a specifica:on Algorithm/Kernel separa:on allows for different languages 19

20 Coded kernels PSyclone kernels wri7en in Fortran so any science can be wri7en PSyclone supports built-in s where no kernel code is required Stella kernels wri7en in C++. Limited to stencil descrip:ons? Firedrake no coded kernels. Closer to FE descrip:on. However, trap door access to manually wri7en C kernels. Coded Fortran kernels requires a data model IJK, IKJ, KIJ, KL 20

21 DSL op.misa.ons I Delaying op:misa:ons 21

22 DSL op.misa.ons II Large op:misa:on space Can op:misa:ons be automated? Perhaps for reasonable performance Interes:ng to search the op:misa:on space Psyclone takes a user-specified op:misa:on approach to support the expert HPC expert provides an op:misa:on recipe Compile :me op:misa:on (sta:c analysis) 22

23 Maintenance benefits O,en overlooked Not only single source but Simplified science code Higher level problem specifica:on Lots of code generated NEMO loop bounds! 23

24 Have we missed anything? Func:onal parallelism Currently primarily limited to coupling Inves:gate finer grain (task hierarchy) Smaller units of work and compose these A finer grain coupling approach Suited to architecture heterogeneity (Jetson TX1) EuroExa EU project proposal, Peter Bauer, Graham Riley, Hartree Centre, Poten:al for flexible precision (fpga s) 24

25 Summary To cross the chasm we should aim for single source science code, higher level specifica:ons and performance portability DSEL s are a poten:al way forward Stella/GridTools and PSyclone/LFRic provide demonstrators Mostly complimentary work Perhaps PSyclone could use Stella? Poten:al for collabora:on with op:misa:ons? Convince other groups to try these out Threading abstrac:on libraries are interes:ng. These are complimentary to DSL s DSL s can use them. Func:onal parallelism and heterogeneous architectures are emerging issues Data layout is s:ll a poten:al issue 25

26 Thank you for listening

27 The Abyss What is PSyclone? (Wikipedia) When I made Psyclone, I was at the height of my alcoholism and addic:on, I was literally staring into the abyss" 27

28 Bring people with you 28

29 Hartree Centre Clients

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU

More information

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United

More information

PSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology

PSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone Separation of Concerns for HPC Codes Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone PSyclone developed by The Hartree Centre STFC Daresbury Laboratory, UK (since

More information

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg. Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic

More information

Design Principles & Prac4ces

Design Principles & Prac4ces Design Principles & Prac4ces Robert France Robert B. France 1 Understanding complexity Accidental versus Essen4al complexity Essen%al complexity: Complexity that is inherent in the problem or the solu4on

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

CPU GPU. Regional Models. Global Models. Bigger Systems More Expensive Facili:es Bigger Power Bills Lower System Reliability

CPU GPU. Regional Models. Global Models. Bigger Systems More Expensive Facili:es Bigger Power Bills Lower System Reliability Xbox 360 Successes and Challenges using GPUs for Weather and Climate Models DOE Jaguar Mark GoveM Jacques Middlecoff, Tom Henderson, Jim Rosinski, Craig Tierney CPU Bigger Systems More Expensive Facili:es

More information

An Introduction to the LFRic Project

An Introduction to the LFRic Project An Introduction to the LFRic Project Mike Hobson Acknowledgements: LFRic Project Met Office: Sam Adams, Tommaso Benacchio, Matthew Hambley, Mike Hobson, Chris Maynard, Tom Melvin, Steve Mullerworth, Stephen

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

F.P. Brooks, No Silver Bullet: Essence and Accidents of Software Engineering CIS 422

F.P. Brooks, No Silver Bullet: Essence and Accidents of Software Engineering CIS 422 The hardest single part of building a software system is deciding precisely what to build. No other part of the conceptual work is as difficult as establishing the detailed technical requirements...no

More information

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node

Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Unstructured Finite Volume Code on a Cluster with Mul6ple GPUs per Node Keith Obenschain & Andrew Corrigan Laboratory for Computa;onal Physics and Fluid Dynamics Naval Research Laboratory Washington DC,

More information

Super Instruction Architecture for Heterogeneous Systems. Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders

Super Instruction Architecture for Heterogeneous Systems. Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders Super Instruction Architecture for Heterogeneous Systems Victor Lotric, Nakul Jindal, Erik Deumens, Rod Bartlett, Beverly Sanders Super Instruc,on Architecture Mo,vated by Computa,onal Chemistry Coupled

More information

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows

More information

Porting COSMO to Hybrid Architectures

Porting COSMO to Hybrid Architectures Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi ACES Aus)n, TX Dec. 04 2013 Kent Milfeld, Luke Wilson, John McCalpin, Lars Koesterke TACC What is it? Co- processor PCI Express card Stripped down Linux opera)ng system Dense, simplified

More information

Experiences with CUDA & OpenACC from porting ACME to GPUs

Experiences with CUDA & OpenACC from porting ACME to GPUs Experiences with CUDA & OpenACC from porting ACME to GPUs Matthew Norman Irina Demeshko Jeffrey Larkin Aaron Vose Mark Taylor ORNL is managed by UT-Battelle for the US Department of Energy ORNL Sandia

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

LFRic: Developing the next generation atmospheric mode for the Metoffice

LFRic: Developing the next generation atmospheric mode for the Metoffice LFRic: Developing the next generation atmospheric mode for the Metoffice Dr Christopher Maynard, Met Office 27 September 2017 LFRic Acknowledgements S. Adams, M. Ashworth, T. Benacchio, R. Ford, M. Glover,

More information

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance on Hybrid Systems with libsci_acc Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 What is Cray Libsci_acc? Provide basic scientific

More information

Overview of research activities Toward portability of performance

Overview of research activities Toward portability of performance Overview of research activities Toward portability of performance Do dynamically what can t be done statically Understand evolution of architectures Enable new programming models Put intelligence into

More information

Alterna(ve Architectures

Alterna(ve Architectures Alterna(ve Architectures COMS W4118 Prof. Kaustubh R. Joshi krj@cs.columbia.edu hep://www.cs.columbia.edu/~krj/os References: Opera(ng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright

More information

Ehsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas

Ehsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas Ehsan Totoni Babak Behzad Swapnil Ghike Josep Torrellas 2 Increasing number of transistors on chip Power and energy limited Single- thread performance limited => parallelism Many opeons: heavy mulecore,

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

OpenMP for next generation heterogeneous clusters

OpenMP for next generation heterogeneous clusters OpenMP for next generation heterogeneous clusters Jens Breitbart Research Group Programming Languages / Methodologies, Universität Kassel, jbreitbart@uni-kassel.de Abstract The last years have seen great

More information

Component diagrams. Components Components are model elements that represent independent, interchangeable parts of a system.

Component diagrams. Components Components are model elements that represent independent, interchangeable parts of a system. Component diagrams Components Components are model elements that represent independent, interchangeable parts of a system. Components are more abstract than classes and can be considered to be stand- alone

More information

OpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2

OpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2 2014@San Jose Shanghai Jiao Tong University Tokyo Institute of Technology OpenACC2 vs.openmp4 he Strong, the Weak, and the Missing to Develop Performance Portable Applica>ons on GPU and Xeon Phi James

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

An Introduc+on to OpenACC Part II

An Introduc+on to OpenACC Part II An Introduc+on to OpenACC Part II Wei Feinstein HPC User Services@LSU LONI Parallel Programming Workshop 2015 Louisiana State University 4 th HPC Parallel Programming Workshop An Introduc+on to OpenACC-

More information

Parallel Computing. November 20, W.Homberg

Parallel Computing. November 20, W.Homberg Mitglied der Helmholtz-Gemeinschaft Parallel Computing November 20, 2017 W.Homberg Why go parallel? Problem too large for single node Job requires more memory Shorter time to solution essential Better

More information

Parallel programming with Java Slides 1: Introduc:on. Michelle Ku=el August/September 2012 (lectures will be recorded)

Parallel programming with Java Slides 1: Introduc:on. Michelle Ku=el August/September 2012 (lectures will be recorded) Parallel programming with Java Slides 1: Introduc:on Michelle Ku=el August/September 2012 mku=el@cs.uct.ac.za (lectures will be recorded) Changing a major assump:on So far, most or all of your study of

More information

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ.

An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. An innovative compilation tool-chain for embedded multi-core architectures M. Torquati, Computer Science Departmente, Univ. Of Pisa Italy 29/02/2012, Nuremberg, Germany ARTEMIS ARTEMIS Joint Joint Undertaking

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

CISC327 - So*ware Quality Assurance

CISC327 - So*ware Quality Assurance CISC327 - So*ware Quality Assurance Lecture 8 Introduc

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

A formal design process, part 2

A formal design process, part 2 Principles of So3ware Construc9on: Objects, Design, and Concurrency Designing (sub-) systems A formal design process, part 2 Josh Bloch Charlie Garrod School of Computer Science 1 Administrivia Midterm

More information

Design Pa*erns. + Anima/on Undo/Redo Graphics and Hints

Design Pa*erns. + Anima/on Undo/Redo Graphics and Hints Design Pa*erns + Anima/on Undo/Redo Graphics and Hints Design Pa*erns Design: the planning that lays the basis for the making of every object or system Pa*ern: a type of theme of recurring events or objects

More information

OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER

OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER OPTIMAL ROUTING VS. ROUTE REFLECTOR VNF - RECONCILE THE FIRE WITH WATER Rafal Jan Szarecki #JNCIE136 Solu9on Architect, Juniper Networks. AGENDA Route Reflector VNF - goals Route Reflector challenges and

More information

CISC327 - So*ware Quality Assurance

CISC327 - So*ware Quality Assurance CISC327 - So*ware Quality Assurance Lecture 12 Black Box Tes?ng CISC327-2003 2017 J.R. Cordy, S. Grant, J.S. Bradbury, J. Dunfield Black Box Tes?ng Outline Last?me we con?nued with black box tes?ng and

More information

CISC327 - So*ware Quality Assurance

CISC327 - So*ware Quality Assurance CISC327 - So*ware Quality Assurance Lecture 12 Black Box Tes?ng CISC327-2003 2017 J.R. Cordy, S. Grant, J.S. Bradbury, J. Dunfield Black Box Tes?ng Outline Last?me we con?nued with black box tes?ng and

More information

Faster Code for Free: Linear Algebra Libraries. Advanced Research Compu;ng 22 Feb 2017

Faster Code for Free: Linear Algebra Libraries. Advanced Research Compu;ng 22 Feb 2017 Faster Code for Free: Linear Algebra Libraries Advanced Research Compu;ng 22 Feb 2017 Outline Introduc;on Implementa;ons Using them Use on ARC systems Hands on session Conclusions Introduc;on 3 BLAS Level

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

CPU-GPU Heterogeneous Computing

CPU-GPU Heterogeneous Computing CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems

More information

Early Experiences Writing Performance Portable OpenMP 4 Codes

Early Experiences Writing Performance Portable OpenMP 4 Codes Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic

More information

High-Level Synthesis Creating Custom Circuits from High-Level Code

High-Level Synthesis Creating Custom Circuits from High-Level Code High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida Exis%ng Design Flow Register-transfer (RT) synthesis - Specify RT structure (muxes,

More information

Successes and Challenges Using GPUs for Weather and Climate Models

Successes and Challenges Using GPUs for Weather and Climate Models Successes and Challenges Using GPUs for Weather and Climate Models Mark Gove; Tom Henderson, Jacques Middlecoff, Jim Rosinski NOAA Earth System Research Laboratory GPU Programming Approaches Language Approach

More information

Towards Performance Portability in GungHo and GOcean

Towards Performance Portability in GungHo and GOcean Towards Performance Portability in GungHo and GOcean M. Ashworth, R. Ford, M. Glover, D. Ham, M. Hobson, J. Holt, H. Liu, C. Maynard, T. Melvin, L. Mitchell, E. Mueller, S. Pickles, A. Porter, M. Rezny,

More information

Paralleliza(on of the FV3 dycore for GPU and MIC processors. Mark Gove>, Jim Rosinski, Jacques Middlecoff, Yonggang Yu, Daniel Fiorino, Lynd Stringer

Paralleliza(on of the FV3 dycore for GPU and MIC processors. Mark Gove>, Jim Rosinski, Jacques Middlecoff, Yonggang Yu, Daniel Fiorino, Lynd Stringer Paralleliza(on of the FV3 dycore for GPU and MIC processors Mark Gove>, Jim Rosinski, Jacques Middlecoff, Yonggang Yu, Daniel Fiorino, Lynd Stringer FV3 Timeline January 2016 Began analysis, op(miza(on

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Parallel Programming Pa,erns

Parallel Programming Pa,erns Parallel Programming Pa,erns Bryan Mills, PhD Spring 2017 What is a programming pa,erns? Repeatable solu@on to commonly occurring problem It isn t a solu@on that you can t simply apply, the engineer has

More information

Parallel Systems. Project topics

Parallel Systems. Project topics Parallel Systems Project topics 2016-2017 1. Scheduling Scheduling is a common problem which however is NP-complete, so that we are never sure about the optimality of the solution. Parallelisation is a

More information

Distributed & Heterogeneous Programming in C++ for HPC at SC17

Distributed & Heterogeneous Programming in C++ for HPC at SC17 Distributed & Heterogeneous Programming in C++ for HPC at SC17 Michael Wong (Codeplay), Hal Finkel DHPCC++ 2018 1 The Panel 2 Ben Sanders (AMD, HCC, HiP, HSA) Carter Edwards (SNL, Kokkos, ISO C++) CJ Newburn

More information

Single and mul,threaded processes

Single and mul,threaded processes 1 Single and mul,threaded processes Why threads? Express concurrency Web server (mul,ple requests), Browser (GUI + network I/O + rendering), most GUI programs for(;;) { struct request *req = get_request();

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

Application Talk Ludwig: Performance portability

Application Talk Ludwig: Performance portability Application Talk Ludwig: Performance portability Kevin Stratford Alan Gray Kevin Stratford EPCC kevin@epcc.ed.ac.uk +44 131 650 5115 Background EXASCALE APPS. MANCHESTER 2016 2 Complex fluids Mixtures,

More information

Chapter 3 Parallel Software

Chapter 3 Parallel Software Chapter 3 Parallel Software Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU Parallel Applications on Distributed Memory Systems Le Yan HPC User Services @ LSU Outline Distributed memory systems Message Passing Interface (MPI) Parallel applications 6/3/2015 LONI Parallel Programming

More information

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout

More information

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc.

Towards a codelet-based runtime for exascale computing. Chris Lauderdale ET International, Inc. Towards a codelet-based runtime for exascale computing Chris Lauderdale ET International, Inc. What will be covered Slide 2 of 24 Problems & motivation Codelet runtime overview Codelets & complexes Dealing

More information

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project,

Open Compute Stack (OpenCS) Overview. D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, Open Compute Stack (OpenCS) Overview D.D. Nikolić Updated: 20 August 2018 DAE Tools Project, http://www.daetools.com/opencs What is OpenCS? A framework for: Platform-independent model specification 1.

More information

Introduc)on to Xeon Phi

Introduc)on to Xeon Phi Introduc)on to Xeon Phi IXPUG 14 Lars Koesterke Acknowledgements Thanks/kudos to: Sponsor: National Science Foundation NSF Grant #OCI-1134872 Stampede Award, Enabling, Enhancing, and Extending Petascale

More information

CLAW FORTRAN Compiler source-to-source translation for performance portability

CLAW FORTRAN Compiler source-to-source translation for performance portability CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary

More information

more uml: sequence & use case diagrams

more uml: sequence & use case diagrams more uml: sequence & use case diagrams uses of uml as a sketch: very selec)ve informal and dynamic forward engineering: describe some concept you need to implement reverse engineering: explain how some

More information

The Era of Heterogeneous Computing

The Era of Heterogeneous Computing The Era of Heterogeneous Computing EU-US Summer School on High Performance Computing New York, NY, USA June 28, 2013 Lars Koesterke: Research Staff @ TACC Nomenclature Architecture Model -------------------------------------------------------

More information

Accelerating sequential computer vision algorithms using commodity parallel hardware

Accelerating sequential computer vision algorithms using commodity parallel hardware Accelerating sequential computer vision algorithms using commodity parallel hardware Platform Parallel Netherlands GPGPU-day, 28 June 2012 Jaap van de Loosdrecht NHL Centre of Expertise in Computer Vision

More information

Given: m- element vectors A, B, C. Compute: i 1..m, A i = B i + α C i. In pictures:

Given: m- element vectors A, B, C. Compute: i 1..m, A i = B i + α C i. In pictures: Given: m- element vectors A, B, C Compute: i 1..m, A i B i α C i In pictures: A B C α 2 Given: m- element vectors A, B, C Compute: i 1..m, A i B i α C i In pictures, in parallel: A B C α 3 Given: m- element

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

Advanced Computation and I/O Methods for Earth-System Simulations Status update

Advanced Computation and I/O Methods for Earth-System Simulations Status update Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther

More information

Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework

Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework Performance and Optimization Abstractions for Large Scale Heterogeneous Systems in the Cactus/Chemora Framework Erik Schne+er Perimeter Ins1tute for Theore1cal Physics XSCALE 2013, Boulder, CO, 2013-08-

More information

Introduc)on to High Performance Compu)ng Advanced Research Computing

Introduc)on to High Performance Compu)ng Advanced Research Computing Introduc)on to High Performance Compu)ng Advanced Research Computing Outline What cons)tutes high performance compu)ng (HPC)? When to consider HPC resources What kind of problems are typically solved?

More information

Carlos Osuna. Meteoswiss.

Carlos Osuna. Meteoswiss. Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna Meteoswiss carlos.osuna@meteoswiss.ch

More information

RDD and Strategy Pa.ern

RDD and Strategy Pa.ern RDD and Strategy Pa.ern CSCI 3132 Summer 2011 1 OO So1ware Design The tradi7onal view of objects is that they are data with methods. Smart Data. But, it is be.er to think of them as en##es that have responsibili#es.

More information

From Tadpoles to Supercomputers. Paul Fox, Mike Hull, Theo Marke9os, Simon Moore, Ma9 Naylor

From Tadpoles to Supercomputers. Paul Fox, Mike Hull, Theo Marke9os, Simon Moore, Ma9 Naylor From Tadpoles to Supercomputers Paul Fox, Mike Hull, Theo Marke9os, Simon Moore, Ma9 Naylor UKDF 2014 Mo

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Informa(cs 231: What is Design? October 9, 2012

Informa(cs 231: What is Design? October 9, 2012 Informa(cs 231: What is Design? October 9, 2012 IDEO s Deep Dive Excellent example of the user- centered design process IDEO s Deep Dive Video Part 1 - hgp://www.youtube.com/watch?v=oon05q030qo Part 2

More information

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches

More information

Physis: An Implicitly Parallel Framework for Stencil Computa;ons

Physis: An Implicitly Parallel Framework for Stencil Computa;ons Physis: An Implicitly Parallel Framework for Stencil Computa;ons Naoya Maruyama RIKEN AICS (Formerly at Tokyo Tech) GTC12, May 2012 1 è Good performance with low programmer produc;vity Mul;- GPU Applica;on

More information

Cisco Exam Dumps PDF for Guaranteed Success

Cisco Exam Dumps PDF for Guaranteed Success Cisco 300 080 Exam Dumps PDF for Guaranteed Success The PDF version is simply a copy of a Portable Document of your Cisco 300 080 quesons and answers product. The Cisco Cerfied Network Professional Collaboraon

More information

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC

Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC DLR.de Chart 1 Porting a parallel rotor wake simulation to GPGPU accelerators using OpenACC Melven Röhrig-Zöllner DLR, Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU)

More information

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted

More information

Multi/Many Core Programming Strategies

Multi/Many Core Programming Strategies Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference

More information

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM 25th March, GTC 2014, San Jose CA AnE- Pekka Hynninen ane.pekka.hynninen@nrel.gov NREL is a na*onal laboratory of the U.S. Department of Energy,

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Running the FIM and NIM Weather Models on GPUs

Running the FIM and NIM Weather Models on GPUs Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation

Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation Soft GPGPUs for Embedded FPGAS: An Architectural Evaluation 2nd International Workshop on Overlay Architectures for FPGAs (OLAF) 2016 Kevin Andryc, Tedy Thomas and Russell Tessier University of Massachusetts

More information

Getting DCIM Right the First or Second Time Around. PRESENTED BY Chris James CEO, DCIMPro

Getting DCIM Right the First or Second Time Around. PRESENTED BY Chris James CEO, DCIMPro Getting DCIM Right the First or Second Time Around. PRESENTED BY Chris James CEO, DCIMPro Agenda: What are the Core Elements of DCIM? What is DCIM and why? The DCIM Maturation Model What is a Successful

More information

Dynamical Core Rewrite

Dynamical Core Rewrite Dynamical Core Rewrite Tobias Gysi Oliver Fuhrer Carlos Osuna COSMO GM13, Sibiu Fundamental question How to write a model code which allows productive development by domain scientists runs efficiently

More information

COSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan

COSC 310: So*ware Engineering. Dr. Bowen Hui University of Bri>sh Columbia Okanagan COSC 310: So*ware Engineering Dr. Bowen Hui University of Bri>sh Columbia Okanagan 1 Admin A2 is up Don t forget to keep doing peer evalua>ons Deadline can be extended but shortens A3 >meframe Labs This

More information

HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the

HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the HIDDEN SLIDE Summary These slides are meant to be used as is to give an upper level view of perfsonar for an audience that is not familiar with the concept. You *ARE* allowed to delete things you don t

More information

LLVM for the future of Supercomputing

LLVM for the future of Supercomputing LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

A performance portable implementation of HOMME via the Kokkos programming model

A performance portable implementation of HOMME via the Kokkos programming model E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,

More information