Carlos Osuna. Meteoswiss.
|
|
- Margaret Cooper
- 5 years ago
- Views:
Transcription
1 Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna Meteoswiss V. Clement, O. Fuhrer, S. Moosbrugger, X. Lapillonne, P. Spoerri, T. Schulthess, F. Thuering, H. Vogt, T. Wicky
2 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models COSMO 2km ensembles(21 members) COSMO 1km MPI Seminar, 21st Novemebr
3 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models MPI Seminar, 21st Novemebr
4 Top 500 overview Piz daint Titan Sequoia GPU Sunway TaihuLight Disruptvie emergence of accelerators in HPC. Gyoukou Large memory bandwidth have speeded up our models. On NVIDIA GPUs: Tianhe-2 dynamicalcores 2-3x physical Trinity parametrizations 3-7x spectral transforms > 10x Cori K Computer Multi-core oakforest MIC 4 Carlos Osuna, MPI Seminar, 21 November
5 How to achieve portability forourgfd models? Carlos Osuna, MPI Seminar, 21 November
6 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November
7 COSMO global 1km Discussion paper under review Carlos Osuna, MPI Seminar, 21 November
8 DSL Toolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November
9 Carlos Osuna, MPI Seminar, 21 November
10 A domain specific language(dsl) for GFDs: abstracts away all unnecessary details of a programming language(fortran) implementation concise syntax: only those keywords needed to express the numerical problem The GridTools Ecosystem Set of C++ APIs / Libraries DSL for PDEs Large class of problems Performance Portability Separation of concerns Interface to Fortran Open source license Carlos Osuna, MPI Seminar, 21 November
11 DSL for Weather Codes on Unstructured Grids DSLs are a successful approach to deal with the software productivity gap ESCAPE develops a DSL based on GridTools: library tools for solving PDEs on multiple grids. The DSL syntax requires only the grid-point operation: ) = -4* u + u(i+1) + u(i-1) + u(j+1) + u(j-1) Tiled loop templates and parallelization are abstracted Use of fast on-chip memory is also abstracted. Grid is abstracted (in combination with Atlas) Carlos Osuna, MPI Seminar, 21 November
12 Writing Operators using DSL Carlos Osuna, MPI Seminar, 21 November
13 Carlos Osuna, MPI Seminar, 21 November
14 Global Grids New DSL constructs for stencils on global grids Expose a unstructured grid API Leverage grid structure for performance Icosahedral Cubed sphere Octahedral Carlos Osuna, MPI Seminar, 21 November
15 Grid Abstraction Icosahedral Octahedral constepxrauto offsets = connectivity<edges, vertices, Color>::offsets(); for( auto off: offsets) { eval(flux) += eval( pd(off) ); } Dual-grid Cube sphere eval(flux) = eval(sum_on_vertices(0.0, pd)); The DSL syntax elements are grid independent. The same used code using DSL can be compiled for multiple Grids. GridToolswill support efficient native layouts for octahedral/icosahedral. GridToolswill use Atlas in order to support any grid. Carlos Osuna, MPI Seminar, 21 November
16 Example of GridTools DSL : MPDATA Carlos Osuna, MPI Seminar, 21 November
17 Example of GridTools DSL : MPDATA Carlos Osuna, MPI Seminar, 21 November
18 Composition of multiple MPDATA operators m_upwind_fluxes = make_computation<gpu>( domain_uwf, grid_, make_multistage(execute<forward>(), make_stage<upwind_flux, octahedral_topology_t, edges>( p_flux(), p_pd(), p_vn()), make_stage<upwind_fluz, octahedral_topology_t, vertices>( p_fluz(), p_pd(), p_wn()), make_stage<fluxzdiv, octahedral_topology_t, vertices>( p_divvd(), p_flux(), p_fluz(), p_dual_volumes(), p_edges_sign()), Full MPDATA implemented using the GridTools DSL: upwind fluxes minmax limiter compute fluxes Limit fluxes Flux solution Rho correction make_stage<advance_solution, octahedral_topology_t, vertices>( p_pd(), p_divvd(), p_rho()))); Carlos Osuna, MPI Seminar, 21 November
19 Grid Abstraction: Performance Portability Thanks to abstraction of the mesh we can implement any partitioning: Equal partitioner implemented with Atlas Structured partitioner implemented with Atlas O32 mesh Carlos Osuna, MPI Seminar, 21 November
20 Atlas: a library for NWP and climate modelling Willem Deconinck Carlos Osuna, MPI Seminar, 21 November Deconinck et al
21 Grid Abstraction: Performance Portability Abstraction of the mesh can implement any indexing layout Indexing has a large impact in performance: NVIDIA P100, 128x128x80, Bandwidth GB/s B SN DA 211 SN IA 270 UN IA 130 HN IA 256 Carlos Osuna, MPI Seminar, 21 November
22 DSLs and abstractions: state of the art GridTools and Atlas E solely relies on a C++ compiler No external tools provides a high-level of abstraction Performance portability But writing in this newprogramming modele is hard (C++ expertise) Decreased productivity is unsafe No protection from violating the parallel model often requires expert knowledge to produce efficient code Carlos Osuna, MPI Seminar, 21 November
23 What is the Future of GFD Models in a complex HPC environment? 1. Need to increase model development productivity 2. Need to decrease maintainence cost of models 3. Need to extend tools to a large set of models, methods, domains. 4. Need to maximize efficiency Carlos Osuna, MPI Seminar, 21 November
24 DSLToolchains for Performance Portable Geophysical Fluid Dynamic Models Carlos Osuna, MPI Seminar, 21 November
25 Need to increase model development productivity High leveldsls, concise, safe and intuitive syntax that remove boiler-plate from embedded GridTools Carlos Osuna, MPI Seminar, 21 November
26 Need to decrease maintainence of models Check for parallel errors, Auto-generation of boundary conditions and halo exchanges Check forout ofbounds Safe generation of loop data dependencies E. Carlos Osuna, MPI Seminar, 21 November
27 Need toextendtoolstoa large setofmodels, methods, domains. Options: ICON-Ocean IFS COSMO ICON-atmos Unifyall toolsin a singledsl softwarestack-> GFD maintainence models issue. LFRic One DSL (programming model) per model, currently leading to explosion of DSL solutions. ESCAPE2 Liszt HybridFortran GridTools DSLs and programming models PsyClone Firedrake CLAW Carlos Osuna, MPI Seminar, 21 November
28 Need toextendtoolstoa large setofmodels, methods, domains. Options: ICON-Ocean IFS Unifyall COSMO ICONtoolsin a singledsl softwarestack-> GFD maintainence models issue. LFRic One DSL (programming model) per model, currently leading to explosion of DSL solutions. ESCAPE2 Liszt HybridFortran GridTools DSLs and programming models PsyClone Firedrake CLAW Carlos Osuna, MPI Seminar, 21 November
29 In the ESCAPE2 DNA3 Develop programming languages for GFD models that can increase productivity, and lower maintainance Standardization, avoid explosion of tools. Support multiple domains(fv weather, FV ocean, structured, unstructured, FE) in the standardization Multiple domain languages or customizations, single tool Language to be defined together with scientists Carlos Osuna, MPI Seminar, 21 November
30 ESCAPE2 base technology Modular compiler toolchain(based on modern compilers llvm-) GTCLANG DAWN Carlos Osuna, MPI Seminar, 21 November
31 ESCAPE2 base technology Multiple variants of a language FE plugin IFS collocated ICON Ocean plugin a[i+1] on_cells(+,a) Unstructured Structured DAWN Carlos Osuna, MPI Seminar, 21 November
32 Example Horizontal Diffusion Carlos Osuna, MPI Seminar, 21 November 2017 Fabian Thüring
33 Example Horizontal Diffusion stencil_function laplacian { storage phi; Do { return 4.0 * phi phi[i+1] phi[i-1] phi[j+1] phi[j-1]; } }; stencil hd_type2 { storage hd, in; temporary_storage lap; Do { vertical_region(k_start, k_end) { lap = laplacian(in); hd = laplacian(lap); } } }; Carlos Osuna, MPI Seminar, 21 November
34 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[k+1]); } } }; Carlos Osuna, MPI Seminar, 21 November
35 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[j+1] + uavg[j-1]); } } }; Carlos Osuna, MPI Seminar, 21 November
36 Safety First stencil hd_type2 { storage u, utens; temporary_storage uavg; Do { vertical_region(k_start, k_end) { uavg = (u[i+1]+u[i-1])*0.5; utens = (uavg[j+1] + uavg[j-1]); } } }; Carlos Osuna, MPI Seminar, 21 November
37 Interoperatbility : Escaping the language stencil_function laplacian { storage phi; Do { return 4.0 * phi phi[i+1] phi[i-1] phi[j+1] phi[j-1]; } }; stencil hd_type2 { storage hd, in; temporary_storage lap; Do { vertical_region(k_start, k_end) { lap = laplacian(in); hd = laplacian(lap); } } }; #omp parallel for nowait #acc for(int i=0; i < isize; ++i) { for(int j=0; j < jsize; ++j) { for(int k=0; k < ksize; ++k) { udiff(i,j,k) = hd(i+1,j,k) + hd(i-1,j,k); } } } DSL code Standard C++ code (+OpenMP, +OpenACC, +CUDA) Carlos Osuna, MPI Seminar, 21 November
38 {«filename»:«/code/access_test.cpp», «stencils»:[ «name»:«hori_diff», «loc»:{ «line»: 24, «column»: 8 }, «ast»:{ «block_stmt»:{ «statements»:{ «expr_stmt»:{ «assignment_expr»:{ «left»:{ «field_access_expr»:{ «name»:«b», «offset»:[], Python } }, GridToolsT4Py «right»:{ «binary_op»:{ «left»:{ «field_access_expr»:{ «name»:«a», «offset»:[1,0,0], } }, «right»:{ «field_access_expr»:{ «name»:«a», «offset»:[-1,0,0], Carlos Osuna, MPI Seminar, 21 November }}}}}}}}}} } Standardization: SIR Multiple DSL (thin) frontends can generate standard SIR scheme C++ GTCLANG Fortran CLAW Fortran PsyClone ESCAPE2 / Ocean b = a(i+1) + a(i-1)
39 DAWN: Compiler optimization passes Carlos Osuna, MPI Seminar, 21 November
40 COSMO dycore evaluation of DSL toolchain Type 2 u Type 2 v Type 2 w Type 2 pp Smagorinsky Horizontal Diffusion (P100) Advection( metricand time-p100 -) Carlos Osuna, MPI Seminar, 21 November
41 Conclusions The future of GFD models, high resolution, multidecade simulation will bring serious computational challenges: efficiency and production cost, maintenance cost Weneedtofind rightprogrammingmodelsso thatadaptationto newarchitecturesdo not hinderscientificdevelopment-> high productivity Single effortsarenot a valid modelanymore, weneed collaborations ESCAPE2 is a great opportunity to define the new language with scientists, we have the technology in place. Carlos Osuna, MPI Seminar, 21 November
42 BACKUPS MPI Seminar, 21st Novemebr 2017
43 MPI Seminar, 21st Novemebr 2017
44 Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss MeteoSwiss Operation Center 1 CH-8058 Zurich-Airport T MeteoSvizzera Via ai Monti 146 CH-6605 Locarno-Monti T MétéoSuisse 7bis, av. de la Paix CH-1211 Genève 2 T MétéoSuisse Chemin de l Aérologie CH-1530 Payerne T Internal presentation, Valentin Clément 44
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss. PP POMPA status.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss PP POMPA status Xavier Lapillonne Performance On Massively Parallel Architectures Last year of the project
More informationNext generation of Quality Management Tools at
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Next generation of Quality Management Tools at MeteoSwiss Data Management Workshop 2017 Zagreb M. Musa,
More informationPorting COSMO to Hybrid Architectures
Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,
More informationDeutscher Wetterdienst
Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationCLAW FORTRAN Compiler source-to-source translation for performance portability
CLAW FORTRAN Compiler source-to-source translation for performance portability XcalableMP Workshop, Akihabara, Tokyo, Japan October 31, 2017 Valentin Clement valentin.clement@env.ethz.ch Image: NASA Summary
More informationAdapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs
Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches
More informationSTELLA: A Domain Specific Language for Stencil Computations
STELLA: A Domain Specific Language for Stencil Computations, Center for Climate Systems Modeling, ETH Zurich Tobias Gysi, Department of Computer Science, ETH Zurich Oliver Fuhrer, Federal Office of Meteorology
More informationThe challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.
The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
More informationCLAW FORTRAN Compiler Abstractions for Weather and Climate Models
CLAW FORTRAN Compiler Abstractions for Weather and Climate Models Image: NASA PASC 17 June 27, 2017 Valentin Clement, Jon Rood, Sylvaine Ferrachat, Will Sawyer, Oliver Fuhrer, Xavier Lapillonne valentin.clement@env.ethz.ch
More informationPreparing a weather prediction and regional climate model for current and emerging hardware architectures.
Preparing a weather prediction and regional climate model for current and emerging hardware architectures. Oliver Fuhrer (MeteoSwiss), Tobias Gysi (Supercomputing Systems AG), Xavier Lapillonne (C2SM),
More informationAtlas, a library for NWP and climate modelling
Atlas, a library for NWP and climate modelling MPI-M Seminar 28 November 2017, Hamburg By Willem Deconinck willem.deconinck@ecmwf.int ECMWF December 18, 2017 The Integrated Forecasting System (IFS) Spectral
More informationPP POMPA (WG6) News and Highlights. Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team. COSMO GM13, Sibiu
PP POMPA (WG6) News and Highlights Oliver Fuhrer (MeteoSwiss) and the whole POMPA project team COSMO GM13, Sibiu Task Overview Task 1 Performance analysis and documentation Task 2 Redesign memory layout
More informationThe CLAW project. Valentin Clément, Xavier Lapillonne. CLAW provides high-level Abstractions for Weather and climate models
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss The CLAW project CLAW provides high-level Abstractions for Weather and climate models Valentin Clément,
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationAn update on the COSMO- GPU developments
An update on the COSMO- GPU developments COSMO User Workshop 2014 X. Lapillonne, O. Fuhrer, A. Arteaga, S. Rüdisühli, C. Osuna, A. Roches and the COSMO- GPU team Eidgenössisches Departement des Innern
More informationDynamical Core Rewrite
Dynamical Core Rewrite Tobias Gysi Oliver Fuhrer Carlos Osuna COSMO GM13, Sibiu Fundamental question How to write a model code which allows productive development by domain scientists runs efficiently
More informationDevelopments in Computing Technology: GPUs
Developments in Computing Technology: GPUs Mark Richardson, Technical Head, CEMAC Mark Richardson, CEMAC, GPU showcase 29 th November 2017 1 Welcome to first CEMAC seminar Will try to hold one every 3
More informationCOSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder,
COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, 8.9.2011 Supercomputing Systems AG Technopark 1 8005 Zürich 1 Fon +41 43 456 16 00 Fax +41 43 456 16 10 www.scs.ch Boulder, 8.9.2011, by
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationMemory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25
ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters
More informationThe ECMWF forecast model, quo vadis?
The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian
More informationA PSyclone perspec.ve of the big picture. Rupert Ford STFC Hartree Centre
A PSyclone perspec.ve of the big picture Rupert Ford STFC Hartree Centre Requirement I Maintainable so,ware maintain codes in a way that subject ma7er experts can s:ll modify the code Leslie Hart from
More informationTORSTEN HOEFLER
TORSTEN HOEFLER MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures most work performed by TOBIAS GYSI AND TOBIAS GROSSER Stencil computations (oh no,
More informationGPU Developments in Atmospheric Sciences
GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA NVIDIA HPC UPDATE
More informationPorting the ICON Non-hydrostatic Dynamics to GPUs
Porting the ICON Non-hydrostatic Dynamics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Gilles Fourestey (CSCS/ETH) IS-ENES Workshop on Dynamical Cores Dec. 14-16, 2011, Carlo V Castle, Lecce,
More informationMODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures
TORSTEN HOEFLER MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures with support of Tobias Gysi, Tobias Grosser @ SPCL presented at Guangzhou, China,
More informationDeutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development
Deutscher Wetterdienst COSMO, ICON and Computers Ulrich Schättler Deutscher Wetterdienst Research and Development Contents Problems of the COSMO-Model on HPC architectures POMPA and The ICON Model Outlook
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationGPU Consideration for Next Generation Weather (and Climate) Simulations
GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.
More informationMIR. ECMWF s New Interpolation Package. P. Maciel, T. Quintino, B. Raoult, M. Fuentes, S. Villaume ECMWF. ECMWF March 9, 2016
MIR ECMWF s New Interpolation Package P. Maciel, T. Quintino, B. Raoult, M. Fuentes, S. Villaume ECMWF mars-admins@ecmwf.int ECMWF March 9, 2016 Upgrading the Interpolation Package Interpolation is pervasive:
More informationParallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.
Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationProgramming Models for Multi- Threading. Brian Marshall, Advanced Research Computing
Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows
More informationDOI: /jsfi Towards a performance portable, architecture agnostic implementation strategy for weather and climate models
DOI: 10.14529/jsfi140103 Towards a performance portable, architecture agnostic implementation strategy for weather and climate models Oliver Fuhrer 1, Carlos Osuna 2, Xavier Lapillonne 2, Tobias Gysi 3,4,
More informationAn Introduction to the LFRic Project
An Introduction to the LFRic Project Mike Hobson Acknowledgements: LFRic Project Met Office: Sam Adams, Tommaso Benacchio, Matthew Hambley, Mike Hobson, Chris Maynard, Tom Melvin, Steve Mullerworth, Stephen
More informationFrom Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation Erik Schnetter, Perimeter Institute with M. Blazewicz, I. Hinder, D. Koppelman, S. Brandt, M. Ciznicki, M.
More informationEULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics
Zbigniew P. Piotrowski *,** EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics *Geophysical Turbulence Program, National Center for Atmospheric Research,
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationParallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel Programming Libraries and implementations
Parallel Programming Libraries and implementations Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License.
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationGPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler
GPGPU Offloading with OpenMP 4.5 In the IBM XL Compiler Taylor Lloyd Jose Nelson Amaral Ettore Tiotto University of Alberta University of Alberta IBM Canada 1 Why? 2 Supercomputer Power/Performance GPUs
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationDynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California
Dynamic Cuda with F# HPC GPU & F# Meetup March 19 San Jose, California Dr. Daniel Egloff daniel.egloff@quantalea.net +41 44 520 01 17 +41 79 430 03 61 About Us! Software development and consulting company!
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationcdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast
cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast Max-Planck-Institut für Meteorologie, DKRZ September 24, 2014 MAX-PLANCK-GESELLSCHAFT Data
More informationStatus of the COSMO GPU version
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Status of the COSMO GPU version Xavier Lapillonne Contributors in 2015 (Thanks!) Alon Shtivelman Andre Walser
More informationTowards Exascale Computing with the Atmospheric Model NUMA
Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey
More informationPerformance Analysis of the PSyKAl Approach for a NEMO-based Benchmark
Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark Mike Ashworth, Rupert Ford and Andrew Porter Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory United
More informationPorting the ICON Non-hydrostatic Dynamics and Physics to GPUs
Porting the ICON Non-hydrostatic Dynamics and Physics to GPUs William Sawyer (CSCS/ETH), Christian Conti (ETH), Xavier Lapillonne (C2SM/ETH) Programming weather, climate, and earth-system models on heterogeneous
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationAnalysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P
Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Stephen Wang 1, James Lin 1, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See
More informationDeutscher Wetterdienst
Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First
More informationShifter: Fast and consistent HPC workflows using containers
Shifter: Fast and consistent HPC workflows using containers CUG 2017, Redmond, Washington Lucas Benedicic, Felipe A. Cruz, Thomas C. Schulthess - CSCS May 11, 2017 Outline 1. Overview 2. Docker 3. Shifter
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationOpportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs. Thomas C. Schulthess
Opportunities & Challenges for Piz Daint s Cray XC50 with ~5000 P100 GPUs Thomas C. Schulthess 1 Piz Daint 2017 fact sheet ~5 000 NVIDIA P100 GPU accelerated nodes ~1 400 Dual multi-core socket nodes Model
More informationAccelerating Extreme-Scale Numerical Weather Prediction
Accelerating Extreme-Scale Numerical Weather Prediction Willem Deconinck (B), Mats Hamrud, Christian Kühnlein, George Mozdzynski, Piotr K. Smolarkiewicz, Joanna Szmelter 2, and Nils P. Wedi European Centre
More informationPorting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms
Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms http://www.scidacreview.org/0902/images/esg13.jpg Matthew Norman Jeffrey Larkin Richard Archibald Valentine Anantharaj
More informationDirective-based Programming for Highly-scalable Nodes
Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism
More informationThe Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer
The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer William Killian Tom Scogland, Adam Kunen John Cavazos Millersville University of Pennsylvania
More informationA PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers Maxime Martinasso, Grzegorz Kwasniewski, Sadaf R. Alam, Thomas C. Schulthess, Torsten Hoefler Swiss National Supercomputing
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationOP2 FOR MANY-CORE ARCHITECTURES
OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationHigh-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration
High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing
More informationAdvances in Time-Parallel Four Dimensional Data Assimilation in a Modular Software Framework
Advances in Time-Parallel Four Dimensional Data Assimilation in a Modular Software Framework Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 28 October 2016
More informationAchieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor
Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Stephen Wang 1, James Lin 1,4, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See 1,3
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationOpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer
OpenACC Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries Compiler Directives Programming Languages Easy to use Most Performance
More informationAdvanced Computation and I/O Methods for Earth-System Simulations Status update
Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther
More informationA performance portable implementation of HOMME via the Kokkos programming model
E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationIntroducing Overdecomposition to Existing Applications: PlasComCM and AMPI
Introducing Overdecomposition to Existing Applications: PlasComCM and AMPI Sam White Parallel Programming Lab UIUC 1 Introduction How to enable Overdecomposition, Asynchrony, and Migratability in existing
More informationPSyclone Separation of Concerns for HPC Codes. Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology
PSyclone Separation of Concerns for HPC Codes Dr. Joerg Henrichs Computational Science Manager Bureau of Meteorology PSyclone PSyclone developed by The Hartree Centre STFC Daresbury Laboratory, UK (since
More informationTowards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA
Towards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA L. Oteski, G. Colin de Verdière, S. Contassot-Vivier, S. Vialle,
More informationGPU-Powered WRF in the Cloud for Research and Operational Applications
GPU-Powered WRF in the Cloud for Research and Operational Applications John Manobianco, Chief Scientist Don Berchoff, Chief Technical Officer john@tempoquest.com, don@tempoquest.com 2017 Modeling Research
More informationLLVM for the future of Supercomputing
LLVM for the future of Supercomputing Hal Finkel hfinkel@anl.gov 2017-03-27 2017 European LLVM Developers' Meeting What is Supercomputing? Computing for large, tightly-coupled problems. Lots of computational
More informationLFRic: Developing the next generation atmospheric mode for the Metoffice
LFRic: Developing the next generation atmospheric mode for the Metoffice Dr Christopher Maynard, Met Office 27 September 2017 LFRic Acknowledgements S. Adams, M. Ashworth, T. Benacchio, R. Ford, M. Glover,
More informationExperiences with CUDA & OpenACC from porting ACME to GPUs
Experiences with CUDA & OpenACC from porting ACME to GPUs Matthew Norman Irina Demeshko Jeffrey Larkin Aaron Vose Mark Taylor ORNL is managed by UT-Battelle for the US Department of Energy ORNL Sandia
More informationEnd to End Optimization Stack for Deep Learning
End to End Optimization Stack for Deep Learning Presenter: Tianqi Chen Paul G. Allen School of Computer Science & Engineering University of Washington Collaborators University of Washington AWS AI Team
More informationRAMSES on the GPU: An OpenACC-Based Approach
RAMSES on the GPU: An OpenACC-Based Approach Claudio Gheller (ETHZ-CSCS) Giacomo Rosilho de Souza (EPFL Lausanne) Romain Teyssier (University of Zurich) Markus Wetzstein (ETHZ-CSCS) PRACE-2IP project EU
More informationAFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library
AFOSR BRI: Codifying and Applying a Methodology for Manual Co-Design and Developing an Accelerated CFD Library Synergy@VT Collaborators: Paul Sathre, Sriram Chivukula, Kaixi Hou, Tom Scogland, Harold Trease,
More informationPyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python. F.D. Witherden, M. Klemm, P.E. Vincent
PyFR: Heterogeneous Computing on Mixed Unstructured Grids with Python F.D. Witherden, M. Klemm, P.E. Vincent 1 Overview Motivation. Accelerators and Modern Hardware Python and PyFR. Summary. Motivation
More informationPLAN-E Workshop Switzerland. Welcome! September 8, 2016
PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,
More informationOpenACC programming for GPGPUs: Rotor wake simulation
DLR.de Chart 1 OpenACC programming for GPGPUs: Rotor wake simulation Melven Röhrig-Zöllner, Achim Basermann Simulations- und Softwaretechnik DLR.de Chart 2 Outline Hardware-Architecture (CPU+GPU) GPU computing
More informationEvaluating the Performance and Energy Efficiency of the COSMO-ART Model System
Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System Joseph Charles & William Sawyer (CSCS), Manuel F. Dolz (UHAM), Sandra Catalán (UJI) EnA-HPC, Dresden September 1-2, 2014 1
More informationParallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationImproving Uintah s Scalability Through the Use of Portable
Improving Uintah s Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks John Holmen1, Alan Humphrey1, Daniel Sunderland2, Martin Berzins1 University of Utah1 Sandia National Laboratories2
More informationAMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationA Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA
A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA L. Oteski, G. Colin de Verdière, S. Contassot-Vivier, S. Vialle, J. Ryan Acks.: CEA/DIFF, IDRIS, GENCI, NVIDIA, Région
More information