Carlos Osuna. Meteoswiss.

Size: px
Start display at page:

Download "Carlos Osuna. Meteoswiss."

Transcription

1 Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna Meteoswiss V. Clement, O. Fuhrer, S. Moosbrugger, X. Lapillonne, P. Spoerri, T. Schulthess, F. Thuering, H. Vogt, T. Wicky

2 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models COSMO 2km ensembles(21 members) COSMO 1km MPI Seminar, 21st Novemebr

3 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models MPI Seminar, 21st Novemebr

4 Top 500 overview Piz daint Titan Sequoia GPU Sunway TaihuLight Disruptvie emergence of accelerators in HPC. Gyoukou Large memory bandwidth have speeded up our models. On NVIDIA GPUs: Tianhe-2 dynamicalcores 2-3x physical Trinity parametrizations 3-7x spectral transforms > 10x Cori K Computer Multi-core oakforest MIC 4 Carlos Osuna, MPI Seminar, 21 November

5 How to achieve portability forourgfd models? Carlos Osuna, MPI Seminar, 21 November

6 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November

7 COSMO global 1km Discussion paper under review Carlos Osuna, MPI Seminar, 21 November

8 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November

9 Carlos Osuna, MPI Seminar, 21 November

10 A domain specific language(dsl) for GFDs: abstracts away all unnecessary details of a programming language(fortran) implementation concise syntax: only those keywords needed to express the numerical problem The GridTools Ecosystem Set of C++ APIs / Libraries DSL for PDEs Large class of problems Performance Portability Separation of concerns Interface to Fortran Open source license Carlos Osuna, MPI Seminar, 21 November

11 DSL for Weather Codes on Unstructured Grids DSLs are a successful approach to deal with the software productivity gap ESCAPE develops a DSL based on GridTools: library tools for solving PDEs on multiple grids. The DSL syntax requires only the grid-point operation: ) = -4* u + u(i+1) + u(i-1) + u(j+1) + u(j-1) Tiled loop templates and parallelization are abstracted Use of fast on-chip memory is also abstracted. Grid is abstracted (in combination with Atlas) Carlos Osuna, MPI Seminar, 21 November

12 Writing Operators using DSL Carlos Osuna, MPI Seminar, 21 November

13 Carlos Osuna, MPI Seminar, 21 November

14 Global Grids New DSL constructs for stencils on global grids Expose a unstructured grid API Leverage grid structure for performance Icosahedral Cubed sphere Octahedral Carlos Osuna, MPI Seminar, 21 November

15 Grid Abstraction Icosahedral Octahedral constepxrauto offsets = connectivity<edges, vertices, Color>::offsets(); for( auto off: offsets) { eval(flux) += eval( pd(off) ); } Dual-grid Cube sphere eval(flux) = eval(sum_on_vertices(0.0, pd)); The DSL syntax elements are grid independent. The same used code using DSL can be compiled for multiple Grids. GridToolswill support efficient native layouts for octahedral/icosahedral. GridToolswill use Atlas in order to support any grid. Carlos Osuna, MPI Seminar, 21 November

16 Example of GridTools DSL : MPDATA Carlos Osuna, MPI Seminar, 21 November

17 Example of GridTools DSL : MPDATA Carlos Osuna, MPI Seminar, 21 November

18 Composition of multiple MPDATA operators m_upwind_fluxes = make_computation<gpu>( domain_uwf, grid_, make_multistage(execute<forward>(), make_stage<upwind_flux, octahedral_topology_t, edges>( p_flux(), p_pd(), p_vn()), make_stage<upwind_fluz, octahedral_topology_t, vertices>( p_fluz(), p_pd(), p_wn()), make_stage<fluxzdiv, octahedral_topology_t, vertices>( p_divvd(), p_flux(), p_fluz(), p_dual_volumes(), p_edges_sign()), Full MPDATA implemented using the GridTools DSL: upwind fluxes minmax limiter compute fluxes Limit fluxes Flux solution Rho correction make_stage<advance_solution, octahedral_topology_t, vertices>( p_pd(), p_divvd(), p_rho()))); Carlos Osuna, MPI Seminar, 21 November

19 Grid Abstraction: Performance Portability Thanks to abstraction of the mesh we can implement any partitioning: Equal partitioner implemented with Atlas Structured partitioner implemented with Atlas O32 mesh Carlos Osuna, MPI Seminar, 21 November

20 Atlas: a library for NWP and climate modelling Willem Deconinck Carlos Osuna, MPI Seminar, 21 November Deconinck et al

21 Grid Abstraction: Performance Portability Abstraction of the mesh can implement any indexing layout Indexing has a large impact in performance: NVIDIA P100, 128x128x80, Bandwidth GB/s B SN DA 211 SN IA 270 UN IA 130 HN IA 256 Carlos Osuna, MPI Seminar, 21 November

22 DSLs and abstractions: state of the art GridTools and Atlas E solely relies on a C++ compiler No external tools provides a high-level of abstraction Performance portability But writing in this newprogramming modele is hard (C++ expertise) Decreased productivity is unsafe No protection from violating the parallel model often requires expert knowledge to produce efficient code Carlos Osuna, MPI Seminar, 21 November

23 What is the Future of GFD Models in a complex HPC environment? 1. Need to increase model development productivity 2. Need to decrease maintainence cost of models 3. Need to extend tools to a large set of models, methods, domains. 4. Need to maximize efficiency Carlos Osuna, MPI Seminar, 21 November

24 DSLToolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November

25 Need to increase model development productivity High leveldsls, concise, safe and intuitive syntax that remove boiler-plate from embedded GridTools Carlos Osuna, MPI Seminar, 21 November

26 Need to decrease maintainence of models Check for parallel errors, Auto-generation of boundary conditions and halo exchanges Check forout ofbounds Safe generation of loop data dependencies E. Carlos Osuna, MPI Seminar, 21 November

27 Need toextendtoolstoa large setofmodels, methods, domains. Options: ICON-Ocean IFS COSMO ICON-atmos Unifyall toolsin a singledsl softwarestack-> GFD maintainence models issue. LFRic One DSL (programming model) per model, currently leading to explosion of DSL solutions. ESCAPE2 Liszt HybridFortran GridTools DSLs and programming models PsyClone Firedrake CLAW Carlos Osuna, MPI Seminar, 21 November

28 Need toextendtoolstoa large setofmodels, methods, domains. Options: ICON-Ocean IFS Unifyall COSMO ICONtoolsin a singledsl softwarestack-> GFD maintainence models issue. LFRic One DSL (programming model) per model, currently leading to explosion of DSL solutions. ESCAPE2 Liszt HybridFortran GridTools DSLs and programming models PsyClone Firedrake CLAW Carlos Osuna, MPI Seminar, 21 November

29 In the ESCAPE2 DNA3 Develop programming languages for GFD models that can increase productivity, and lower maintainance Standardization, avoid explosion of tools. Support multiple domains(fv weather, FV ocean, structured, unstructured, FE) in the standardization Multiple domain languages or customizations, single tool Language to be defined together with scientists Carlos Osuna, MPI Seminar, 21 November

30 ESCAPE2 base technology Modular compiler toolchain(based on modern compilers llvm-) GTCLANG DAWN Carlos Osuna, MPI Seminar, 21 November

31 ESCAPE2 base technology Multiple variants of a language FE plugin IFS collocated ICON Ocean plugin a[i+1] on_cells(+,a) Unstructured Structured DAWN Carlos Osuna, MPI Seminar, 21 November

32 Example Horizontal Diffusion Carlos Osuna, MPI Seminar, 21 November 2017 Fabian Thüring

33 Example Horizontal Diffusion stencil_function laplacian { storage phi; Do { return 4.0 * phi phi[i+1] phi[i-1] phi[j+1] phi[j-1]; } }; stencil hd_type2 { storage hd, in; temporary_storage lap; Do { vertical_region(k_start, k_end) { lap = laplacian(in); hd = laplacian(lap); } } }; Carlos Osuna, MPI Seminar, 21 November

34 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[k+1]); } } }; Carlos Osuna, MPI Seminar, 21 November

35 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[j+1] + uavg[j-1]); } } }; Carlos Osuna, MPI Seminar, 21 November

36 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[j+1] + uavg[j-1]); } } }; Carlos Osuna, MPI Seminar, 21 November

37 Interoperatbility : Escaping the language stencil_function laplacian { storage phi; Do { return 4.0 * phi phi[i+1] phi[i-1] phi[j+1] phi[j-1]; } }; stencil hd_type2 { storage hd, in; temporary_storage lap; Do { vertical_region(k_start, k_end) { lap = laplacian(in); hd = laplacian(lap); } } }; #omp parallel for nowait #acc for(int i=0; i < isize; ++i) { for(int j=0; j < jsize; ++j) { for(int k=0; k < ksize; ++k) { udiff(i,j,k) = hd(i+1,j,k) + hd(i-1,j,k); } } } DSL code Standard C++ code (+OpenMP, +OpenACC, +CUDA) Carlos Osuna, MPI Seminar, 21 November

38 {«filename»:«/code/access_test.cpp», «stencils»:[ «name»:«hori_diff», «loc»:{ «line»: 24, «column»: 8 }, «ast»:{ «block_stmt»:{ «statements»:{ «expr_stmt»:{ «assignment_expr»:{ «left»:{ «field_access_expr»:{ «name»:«b», «offset»:[], Python } }, GridToolsT4Py «right»:{ «binary_op»:{ «left»:{ «field_access_expr»:{ «name»:«a», «offset»:[1,0,0], } }, «right»:{ «field_access_expr»:{ «name»:«a», «offset»:[-1,0,0], Carlos Osuna, MPI Seminar, 21 November }}}}}}}}}} } Standardization: SIR Multiple DSL (thin) frontends can generate standard SIR scheme C++ GTCLANG Fortran CLAW Fortran PsyClone ESCAPE2 / Ocean b = a(i+1) + a(i-1)

39 DAWN: Compiler optimization passes Carlos Osuna, MPI Seminar, 21 November

40 COSMO dycore evaluation of DSL toolchain Type 2 u Type 2 v Type 2 w Type 2 pp Smagorinsky Horizontal Diffusion (P100) Advection( metricand time-p100 -) Carlos Osuna, MPI Seminar, 21 November

41 Conclusions The future of GFD models, high resolution, multidecade simulation will bring serious computational challenges: efficiency and production cost, maintenance cost Weneedtofind rightprogrammingmodelsso thatadaptationto newarchitecturesdo not hinderscientificdevelopment-> high productivity Single effortsarenot a valid modelanymore, weneed collaborations ESCAPE2 is a great opportunity to define the new language with scientists, we have the technology in place. Carlos Osuna, MPI Seminar, 21 November

42 BACKUPS MPI Seminar, 21st Novemebr 2017

43 MPI Seminar, 21st Novemebr 2017

44 Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss MeteoSwiss Operation Center 1 CH-8058 Zurich-Airport T MeteoSvizzera Via ai Monti 146 CH-6605 Locarno-Monti T MétéoSuisse 7bis, av. de la Paix CH-1211 Genève 2 T MétéoSuisse Chemin de l Aérologie CH-1530 Payerne T Internal presentation, Valentin Clément 44

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status.

Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status. Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss PP POMPA status Xavier Lapillonne Performance On Massively Parallel Architectures Last year of the project

More information

Next generation of Quality Management Tools at

Next generation of Quality Management Tools at Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Next generation of Quality Management Tools at MeteoSwiss Data Management Workshop 2017 Zagreb M. Musa,

More information

Porting COSMO to Hybrid Architectures

Porting COSMO to Hybrid Architectures Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational

More information

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU

More information

CLAW FORTRAN Compiler source-to-source translation for performance portability

CLAW FORTRAN Compiler source-to-source translation for performance portability CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary

More information

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches

More information

STELLA: A Domain Specific Language for Stencil Computations

STELLA: A Domain Specific Language for Stencil Computations STELLA: A Domain Specific Language for Stencil Computations, Center for Climate Systems Modeling, ETH Zurich Tobias Gysi, Department of Computer Science, ETH Zurich Oliver Fuhrer, Federal Office of Meteorology

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

CLAW FORTRAN Compiler Abstractions for Weather and Climate Models

CLAW FORTRAN Compiler Abstractions for Weather and Climate Models CLAW FORTRAN Compiler Abstractions for Weather and Climate Models Image: NASA PASC 17 June 27, 2017 Valentin Clement, Jon Rood, Sylvaine Ferrachat, Will Sawyer, Oliver Fuhrer, Xavier Lapillonne valentin.clement@env.ethz.ch

More information

Preparing a weather prediction and regional climate model for current and emerging hardware architectures.

Preparing a weather prediction and regional climate model for current and emerging hardware architectures. Preparing a weather prediction and regional climate model for current and emerging hardware architectures. Oliver Fuhrer (MeteoSwiss), Tobias Gysi (Supercomputing Systems AG), Xavier Lapillonne (C2SM),

More information

Atlas, a library for NWP and climate modelling

Atlas, a library for NWP and climate modelling Atlas, a library for NWP and climate modelling MPI-M Seminar 28 November 2017, Hamburg By Willem Deconinck willem.deconinck@ecmwf.int ECMWF December 18, 2017 The Integrated Forecasting System (IFS) Spectral

More information

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu

PP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout

More information

The CLAW project. Valentin Clément, Xavier Lapillonne. CLAW provides high-level Abstractions for Weather and climate models

The CLAW project. Valentin Clément, Xavier Lapillonne. CLAW provides high-level Abstractions for Weather and climate models Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss The CLAW project CLAW provides high-level Abstractions for Weather and climate models Valentin Clément,

More information

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

NVIDIA Update and Directions on GPU Acceleration for Earth System Models NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,

More information

An update on the COSMO- GPU developments

An update on the COSMO- GPU developments An update on the COSMO- GPU developments COSMO User Workshop 2014 X. Lapillonne, O. Fuhrer, A. Arteaga, S. Rüdisühli, C. Osuna, A. Roches and the COSMO- GPU team Eidgenössisches Departement des Innern

More information

Dynamical Core Rewrite

Dynamical Core Rewrite Dynamical Core Rewrite Tobias Gysi Oliver Fuhrer Carlos Osuna COSMO GM13, Sibiu Fundamental question How to write a model code which allows productive development by domain scientists runs efficiently

More information

Developments in Computing Technology: GPUs

Developments in Computing Technology: GPUs Developments in Computing Technology: GPUs Mark Richardson, Technical Head, CEMAC Mark Richardson, CEMAC, GPU showcase 29 th November 2017 1 Welcome to first CEMAC seminar Will try to hold one every 3

More information

COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder,

COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, 8.9.2011 Supercomputing Systems AG Technopark 1 8005 Zürich 1 Fon +41 43 456 16 00 Fax +41 43 456 16 10 www.scs.ch Boulder, 8.9.2011, by

More information

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500

More information

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters

More information

The ECMWF forecast model, quo vadis?

The ECMWF forecast model, quo vadis? The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian

More information

A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre

A PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre A PSyclone perspec.ve of the big picture Rupert Ford STFC Hartree Centre Requirement I Maintainable so,ware maintain codes in a way that subject ma7er experts can s:ll modify the code Leslie Hart from

More information

TORSTEN HOEFLER

TORSTEN HOEFLER TORSTEN HOEFLER MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures most work performed by TOBIAS GYSI AND TOBIAS GROSSER Stencil computations (oh no,

More information

GPU Developments in Atmospheric Sciences

GPU Developments in Atmospheric Sciences GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA NVIDIA HPC UPDATE

More information

Porting the ICON Non-hydrostatic Dynamics to GPUs

Porting the ICON Non-hydrostatic Dynamics to GPUs Porting the ICON Non-hydrostatic Dynamics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Gilles Fourestey (CSCS/ETH) IS-ENES Workshop on Dynamical Cores Dec. 14-16, 2011, Carlo V Castle, Lecce,

More information

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures TORSTEN HOEFLER MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures with support of Tobias Gysi, Tobias Grosser @ SPCL presented at Guangzhou, China,

More information

Deutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development

Deutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development Deutscher Wetterdienst COSMO, ICON and Computers Ulrich Schättler Deutscher Wetterdienst Research and Development Contents Problems of the COSMO-Model on HPC architectures POMPA and The ICON Model Outlook

More information

Stan Posey, NVIDIA, Santa Clara, CA, USA

Stan Posey, NVIDIA, Santa Clara, CA, USA Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

GPU Consideration for Next Generation Weather (and Climate) Simulations

GPU Consideration for Next Generation Weather (and Climate) Simulations GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.

More information

MIR. ECMWF s New Interpolation Package. P. Maciel, T. Quintino, B. Raoult, M. Fuentes, S. Villaume ECMWF. ECMWF March 9, 2016

MIR. ECMWF s New Interpolation Package. P. Maciel, T. Quintino, B. Raoult, M. Fuentes, S. Villaume ECMWF. ECMWF March 9, 2016 MIR ECMWF s New Interpolation Package P. Maciel, T. Quintino, B. Raoult, M. Fuentes, S. Villaume ECMWF mars-admins@ecmwf.int ECMWF March 9, 2016 Upgrading the Interpolation Package Interpolation is pervasive:

More information

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg. Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic

More information

Running the FIM and NIM Weather Models on GPUs

Running the FIM and NIM Weather Models on GPUs Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution

More information

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows

More information

DOI: /jsfi Towards a performance portable, architecture agnostic implementation strategy for weather and climate models

DOI: /jsfi Towards a performance portable, architecture agnostic implementation strategy for weather and climate models DOI: 10.14529/jsfi140103 Towards a performance portable, architecture agnostic implementation strategy for weather and climate models Oliver Fuhrer 1, Carlos Osuna 2, Xavier Lapillonne 2, Tobias Gysi 3,4,

More information

An Introduction to the LFRic Project

An Introduction to the LFRic Project An Introduction to the LFRic Project Mike Hobson Acknowledgements: LFRic Project Met Office: Sam Adams, Tommaso Benacchio, Matthew Hambley, Mike Hobson, Chris Maynard, Tom Melvin, Steve Mullerworth, Stephen

More information

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.

More information

EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics

EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics Zbigniew P. Piotrowski *,** EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics *Geophysical Turbulence Program, National Center for Atmospheric Research,

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

Parallel Programming. Libraries and Implementations

Parallel Programming. Libraries and Implementations Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming Libraries and implementations

Parallel Programming Libraries and implementations Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.

More information

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated

More information

GPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler

GPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler GPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler Taylor Lloyd Jose Nelson Amaral Ettore Tiotto University of Alberta University of Alberta IBM Canada 1 Why? 2 Supercomputer Power/Performance GPUs

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast

cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast Max-Planck-Institut für Meteorologie, DKRZ September 24, 2014 MAX-PLANCK-GESELLSCHAFT Data

More information

Status of the COSMO GPU version

Status of the COSMO GPU version Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser

More information

Towards Exascale Computing with the Atmospheric Model NUMA

Towards Exascale Computing with the Atmospheric Model NUMA Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey

More information

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United

More information

Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs

Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Xavier Lapillonne (C2SM/ETH) Programming weather, climate, and earth-system models on heterogeneous

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Stephen Wang 1, James Lin 1, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See

More information

Deutscher Wetterdienst

Deutscher Wetterdienst Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First

More information

Shifter: Fast and consistent HPC workflows using containers

Shifter: Fast and consistent HPC workflows using containers Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess

Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs Thomas C. Schulthess 1 Piz Daint 2017 fact sheet ~5 000 NVIDIA P100 GPU accelerated nodes ~1 400 Dual multi-core socket nodes Model

More information

Accelerating Extreme-Scale Numerical Weather Prediction

Accelerating Extreme-Scale Numerical Weather Prediction Accelerating Extreme-Scale Numerical Weather Prediction Willem Deconinck (B), Mats Hamrud, Christian Kühnlein, George Mozdzynski, Piotr K. Smolarkiewicz, Joanna Szmelter 2, and Nils P. Wedi European Centre

More information

Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms

Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms http://www.scidacreview.org/0902/images/esg13.jpg Matthew Norman Jeffrey Larkin Richard Archibald Valentine Anantharaj

More information

Directive-based Programming for Highly-scalable Nodes

Directive-based Programming for Highly-scalable Nodes Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism

More information

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer

The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer William Killian Tom Scogland, Adam Kunen John Cavazos Millersville University of Pennsylvania

More information

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

GPU Computing with NVIDIA s new Kepler Architecture

GPU Computing with NVIDIA s new Kepler Architecture GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,

More information

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing

More information

Advances in Time-Parallel Four Dimensional Data Assimilation in a Modular Software Framework

Advances in Time-Parallel Four Dimensional Data Assimilation in a Modular Software Framework Advances in Time-Parallel Four Dimensional Data Assimilation in a Modular Software Framework Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 28 October 2016

More information

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Stephen Wang 1, James Lin 1,4, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See 1,3

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer OpenACC Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries Compiler Directives Programming Languages Easy to use Most Performance

More information

Advanced Computation and I/O Methods for Earth-System Simulations Status update

Advanced Computation and I/O Methods for Earth-System Simulations Status update Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther

More information

A performance portable implementation of HOMME via the Kokkos programming model

A performance portable implementation of HOMME via the Kokkos programming model E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI

Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Sam White Parallel Programming Lab UIUC 1 Introduction How to enable Overdecomposition, Asynchrony, and Migratability in existing

More information

PSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology

PSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone Separation of Concerns for HPC Codes Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone PSyclone developed by The Hartree Centre STFC Daresbury Laboratory, UK (since

More information

Towards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA

Towards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA Towards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA L. Oteski, G. Colin de Verdière, S. Contassot-Vivier, S. Vialle,

More information

GPU-Powered WRF in the Cloud for Research and Operational Applications

GPU-Powered WRF in the Cloud for Research and Operational Applications GPU-Powered WRF in the Cloud for Research and Operational Applications John Manobianco, Chief Scientist Don Berchoff, Chief Technical Officer john@tempoquest.com, don@tempoquest.com 2017 Modeling Research

More information

LLVM for the future of Supercomputing

LLVM for the future of Supercomputing LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational

More information

LFRic: Developing the next generation atmospheric mode for the Metoffice

LFRic: Developing the next generation atmospheric mode for the Metoffice LFRic: Developing the next generation atmospheric mode for the Metoffice Dr Christopher Maynard, Met Office 27 September 2017 LFRic Acknowledgements S. Adams, M. Ashworth, T. Benacchio, R. Ford, M. Glover,

More information

Experiences with CUDA & OpenACC from porting ACME to GPUs

Experiences with CUDA & OpenACC from porting ACME to GPUs Experiences with CUDA & OpenACC from porting ACME to GPUs Matthew Norman Irina Demeshko Jeffrey Larkin Aaron Vose Mark Taylor ORNL is managed by UT-Battelle for the US Department of Energy ORNL Sandia

More information

End to End Optimization Stack for Deep Learning

End to End Optimization Stack for Deep Learning End to End Optimization Stack for Deep Learning Presenter: Tianqi Chen Paul G. Allen School of Computer Science & Engineering University of Washington Collaborators University of Washington AWS AI Team

More information

RAMSES on the GPU: An OpenACC-Based Approach

RAMSES on the GPU: An OpenACC-Based Approach RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU

More information

AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library

AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library Synergy@VT Collaborators: Paul Sathre, Sriram Chivukula, Kaixi Hou, Tom Scogland, Harold Trease,

More information

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent

PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation

More information

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

PLAN-E Workshop Switzerland. Welcome! September 8, 2016 PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,

More information

OpenACC programming for GPGPUs: Rotor wake simulation

OpenACC programming for GPGPUs: Rotor wake simulation DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing

More information

Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System

Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System Joseph Charles & William Sawyer (CSCS), Manuel F. Dolz (UHAM), Sandra Catalán (UJI) EnA-HPC, Dresden September 1-2, 2014 1

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA

HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh

More information

Improving Uintah s Scalability Through the Use of Portable

Improving Uintah s Scalability Through the Use of Portable Improving Uintah s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2

More information

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016 AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA

A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA L. Oteski, G. Colin de Verdière, S. Contassot-Vivier, S. Vialle, J. Ryan Acks.: CEA/DIFF, IDRIS, GENCI, NVIDIA, Région

More information