First Experiences with Application Development with Fortran Damian Rouson
|
|
- Pamela Stanley
- 5 years ago
- Views:
Transcription
1 First Experiences with Application Development with Fortran 2018 Damian Rouson
2 Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions
3 Overview Fortran 2018 in a Nutshell Motivation: exascale challenges Background: SPMD & PGAS in Fortran 2018 Beyond CAF WRF-Hydro ICAR & Coarray ICAR Results Conclusions
4 EXASCALE CHALLENGES & Fortran 2018 Response Billion-way concurrency with high levels of on-chip parallelism events collective subroutines richer set of atomic subroutines teams Source: Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M. and Le, H., 2014, November. Abstract machine models and proxy architectures for exascale computing. In Hardware-Software Slide / 01 Co-Design for High Performance Computing (Co-HPC), 2014 (pp ). IEEE. Higher failure rates failed-image detection Expensive data movement one-sided communication teams (locality control) Heterogeneous hardware on processor events
5 Single Program Multiple Data Images (SPMD) Image 1 Image 2 Image N Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
6 Single Program Multiple Data Images (SPMD) Image 1 Image 2 Image N Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
7 Single Program Multiple Data Images (SPMD) {processes threads} Image 1 Image 2 Image N... Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
8 Single Program Multiple Data Images (SPMD) {processes threads} Image 1 Image 2 Image N... Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
9 Single Program Multiple Data Images (SPMD) {processes threads} Image 1 Image 2 Image N... Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign sync all
10 Single Program Multiple Data Images (SPMD) {processes threads} Image 1 Image 2 Image N... Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. sync all Sign sync all
11 Single Program Multiple Data Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. sync all Images (SPMD) {processes threads} Image 1 Image 2 Image N Sign sync all... sync all Images execute asynchronously up to a programmer-specified synchronization: sync all sync images allocate/deallocate
12 Partitioned Global Address Space (PGAS)
13 Partitioned Global Address Space (PGAS) Coarrays integrate with other languages features: Fortran 90 array syntax works on local data. Communicate objects
14 Partitioned Global Address Space (PGAS) Coarrays integrate with other languages features: Fortran 90 array syntax works on local data. Communicate objects The ability to drop the square brackets harbors two important implications: Easily determine where communication occurs. Parallelize legacy code with minimal revisions.
15 if (me<n) a(1:2) = a(2:3)[me+1] Image 1 Image 2 Image 3 Sign
16 if (me<n) a(1:2) = a(2:3)[me+1] Image 1 Image 2 Image 3 Sign
17 if (me<n) a(1:2) = a(2:3)[me+1] Image 1 Image 2 Image 3 Sign
18 if (me<n) a(1:2) = a(2:3)[me+1] Image 1 Image 2 Image 3 Sign
19 if (me<n) a(1:2) = a(2:3)[me+1] Image 1 Image 2 Image 3 Sign
20 if (me==1) a(5)[2] = a(5)[3] Image 1 Image 2 Image 3 Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
21 if (me==1) a(5)[2] = a(5)[3] Image 1 Image 2 Image 3 Far far away, behind the word mountains, far from the countries Vokalia and Consonantia, there live the blind texts. Separated they live in Bookmarksgrove. Sign
22 Segment Ordering: An intrinsic module provides the derived type event_type, which encapsulates an atomic_int_kind integer component default-initialized to zero. Events An image increments the event count on a remote image by executing event post. The remote image obtains the post count by executing event_query. Image Control Side Effect event post x atomic_add 1 event_query defines count event wait x atomic_add -1
23 vents ello, world!. greeting_ready(2:n)[1] ok_to_overwrite[3]
24 vents ello, world!. greeting_ready(2:n)[1] ok_to_overwrite[3]
25 vents ello, world! query. greeting_ready(2:n)[1] ok_to_overwrite[3]
26 vents ello, world! query. greeting_ready(2:n)[1] ok_to_overwrite[3]
27 vents ello, world! query post. greeting_ready(2:n)[1] ok_to_overwrite[3]
28 vents ello, world! query post. greeting_ready(2:n)[1] ok_to_overwrite[3]
29 vents ello, world! query post. wait greeting_ready(2:n)[1] ok_to_overwrite[3]
30 vents ello, world! query post. wait greeting_ready(2:n)[1] ok_to_overwrite[3]
31 vents ello, world! query post. post wait greeting_ready(2:n)[1] ok_to_overwrite[3]
32 vents ello, world! query post. post wait greeting_ready(2:n)[1] ok_to_overwrite[3] Performance-oriented constraints: Query and wait must be local. Post and wait are disallowed in do concurrent constructs.
33 vents ello, world! query post. post wait greeting_ready(2:n)[1] ok_to_overwrite[3] Performance-oriented constraints: Query and wait must be local. Post and wait are disallowed in do concurrent constructs. Pro tips: Overlap communication and computation. Wherever safety permits, query without waiting. Write a spin-query-work loop & build a logical mask describing the remaining work.
34 TEAM A set of images that can readily execute independently of other images Team 2 Team 1 Team 2 Image 1 Image 2 Image 3 Image 4 Image 5 Image 6 a(1:4)[1] a(1:4)[2] a(1:4)[2] a(1:4)[4] a(1:4)[5] a(1:4)[6] a(1:4)[4] a(1:4)[3]
35 Collective Subroutines Each non-failed image of the current team must invoke the collective. After parallel calculation/communication, the result is placed on one or all images. Optional arguments: stat, errmsg, result_image, source_image All collectives have intent(inout) argument A holding the input data and may hold the result on return, depending on result_image No implicit synchronization at beginning/end, which allows for overlap with other actions. No image s data is accessed before that image invokes the collective subroutine.
36 Extensible Set of Collective Subroutines
37 Performance User-Defined vs Intrinsic Collectives NERSC Hopper: Xeon nodes on Cray XE6 KNL nodes on a Cray XC Execution Time (sec) co_sum_int32 sum_bin_tree_int32 sum_bin_tree_real64 sum_rec_doubling_int32 sum_rec_doubling_real64 sum_alpha_tree_real e Num. of images
38 Failure Detection Image 1 Image 2 Image 3 Image 4 Image 5 Image n a(1:4)[1] a(1:4)[2] a(1:4)[3] a(1:4)[4] a(1:4)[5] a(1:4)[n] use iso_fortran_env, only : STAT_FAILED_IMAGE integer :: status sync all(stat==status) if (status==stat_failed_image) call fault_tolerant_algorithm()
39 Failure Detection Image 1 Image 2 Image 3 Image 4 Image 5 Image n a(1:4)[1] a(1:4)[2] a(1:4)[3] a(1:4)[4] a(1:4)[5] a(1:4)[n] use iso_fortran_env, only : STAT_FAILED_IMAGE integer :: status sync all(stat==status) if (status==stat_failed_image) call fault_tolerant_algorithm()
40 Failure Detection Image 1 Image 2 Image 3 Image 4 Image 5 Image n a(1:4)[1] a(1:4)[2] a(1:4)[3] a(1:4)[4] a(1:4)[5] a(1:4)[n]
41 FORTRAN 2018 Failed-Image Detection FAIL IMAGE (simulates a failures) IMAGE_STATUS (checks the status of a specific image) FAILED_IMAGES (provides the list of failed images) Coarray operations may fail. STAT= attribute used to check correct behavior.
42 Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR Results Conclusions
43 The Climate Downscaling Problem Climate model (top row) and application needs (bottom row) Topography Computational Cost High-res regional model >10 billion core hours (CONUS 4km 150yrs 40 scenarios) Precipitation
44 The Intermediate Complexity Atmospheric Research Model (ICAR) Analytical solution for flow over topography (right) 90% of the information for 1% of the computational cost ICAR Wind Field over Topography Core Numerics: 90% of cost = Cloud physics grid-cells independent 5% of cost = Advection fully explicit, requires local neighbor communication Gutmann et al (2016) J. of Hydrometeorology, 17 p957. doi: /jhm-d
45 ICAR Intermediate Complexity Atmospheric Research Animation of Water vapor (blue) and Precipitation (Green to Red) over the contiguous United States Output timestep : 1hr Integration timestep : Variable (~30s) Boundary Conditions: ERA-interim (historical) Run on 16 cores with OpenMP Limited scalability
46 ICAR Intermediate Complexity Atmospheric Research Animation of Water vapor (blue) and Precipitation (Green to Red) over the contiguous United States Output timestep : 1hr Integration timestep : Variable (~30s) Boundary Conditions: ERA-interim (historical) Run on 16 cores with OpenMP Limited scalability
47 ICAR comparison to Full-physics atmospheric model (WRF) Ideal Hill case ICAR (red) and WRF (blue) precipitation over an idealized hill (green) Real Simulation WRF (left) and ICAR (right) Season total precipitation over Colorado Rockies
48 ICAR Users and Applications
49 Coarray ICAR Mini-App Object-oriented design Collective broadcast of initial data Overlaps communication & computation via one-sided puts. Coarray halo exchanges with 3D, 1st-order upwind advection (~2000 lines of new code) Cloud microphysics (~5000 lines of preexisting code)
50 Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions
51 WRF-Hydro Weather Research Forecast Hydrological Model Currently, ensemble runs involve redundant initialization calculations that occupy approximately 30% of the runtime and use only the processes allocated to one ensemble member. Our use of Fortran 2018 teams will eliminate the redundancy and the amortization of that effort across all of the processes allocated for the full ensemble of simulations.
52 ICAR comparison to Full-physics atmospheric model (WRF) Ideal Hill case ICAR (red) and WRF (blue) precipitation over an idealized hill (green) Real Simulation WRF (left) and ICAR (right) Season total precipitation over Colorado Rockies
53 Mapping Fortran Teams onto MPI Fortran statement or procedure MPI procedure or variable form team( ) MPI_Barrier( ) MPI_Comm_Split( ) time team_number( ) end team color MPI_Barrier( ) 1 2 3
54 Communicator Hand-off (CAF to MPI)
55 Communicator Hand-off (CAF to MPI) CAF in the driver seat: CAF compiler embeds MPI_Init, MPI_Finalize Language extension exposes MPI communicator
56 Communicator Hand-off (CAF to MPI) CAF in the driver seat: CAF compiler embeds MPI_Init, MPI_Finalize MPI under the hood: Language extension exposes MPI communicator Only trivial modification of MPI application Future work: amortize & speed up initialization
57 Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions
58 OpenSHMEM vs MPI Puts vs gets Coarray ICAR Simulation Time 500 x 500 x 20 grid 2000 x 2000 x 20 grid Number of processes Number of processes GCC 6.3 on NCAR Cheyenne SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon 2.3 GHz
59 Coarray ICAR Speedup 500 x 500 x 20 grid 2000 x 2000 x 20 grid Number of processes Number of processes GCC 6.3 on NCAR Cheyenne SGI ICE XA Cluster: 4032 nodes, 2 x 18-core Xeon 2.3 GHz
60 Coarray ICAR Xeon vs KNL Number of processes Number of processes Compilers & platforms: GNU on Cheyenne; Cray compiler on Edison (Broadwell), Cori (KNL).
61 Coarray ICAR At Scale Number of processes Cray compiler on Edison. Number of processes
62 Conclusions Fortran 2018 is a PGAS language supporting SPMD programming at scale. Programming-model agnosticism is a lifesaver. High productivity pays off: from sharedmemory parallelism to 100,000 cores in ~100 person-hours. CAF interoperates seamlessly with MPI
63 Acknowledgements & References Ethan Gutman Alessandro Fanfarillo James McCreight Brian Friesen Rouson, D., Gutmann, E. D., Fanfarillo, A., & Friesen, B. (2017, November). Performance portability of an intermediatecomplexity atmospheric research model in coarray Fortran. In Proceedings of the Second Annual PGAS Applications Workshop (p. 4). ACM. Rouson, D., McCreight, J. L., & Fanfarillo, A. (2017, November). Incremental caffeination of a terrestrial hydrological modeling framework using Fortran 2018 teams. In Proceedings of the Second Annual PGAS Applications Workshop (p. 6). ACM.
Towards Exascale Computing with Fortran 2015
Towards Exascale Computing with Fortran 2015 Alessandro Fanfarillo National Center for Atmospheric Research Damian Rouson Sourcery Institute Outline Parallelism in Fortran 2008 SPMD PGAS Exascale challenges
More informationLubuntu Linux Virtual Machine
Lubuntu Linux 18.04 Virtual Machine About Us Slide / 01 Founded in 2015, Sourcery Institute is a California nonprofit public-benefit corporation engaged in research, education, and advisory services in
More informationLecture V: Introduction to parallel programming with Fortran coarrays
Lecture V: Introduction to parallel programming with Fortran coarrays What is parallel computing? Serial computing Single processing unit (core) is used for solving a problem One task processed at a time
More informationParallel Programming in Fortran with Coarrays
Parallel Programming in Fortran with Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 is now in FDIS ballot: only typos permitted at this stage.
More informationFortran 2008: what s in it for high-performance computing
Fortran 2008: what s in it for high-performance computing John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory Fortran 2008 has been completed and is about to be published.
More informationAdditional Parallel Features in Fortran An Overview of ISO/IEC TS 18508
Additional Parallel Features in Fortran An Overview of ISO/IEC TS 18508 Dr. Reinhold Bader Leibniz Supercomputing Centre Introductory remarks Technical Specification a Mini-Standard permits implementors
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationPortable, MPI-Interoperable! Coarray Fortran
Portable, MPI-Interoperable! Coarray Fortran Chaoran Yang, 1 Wesley Bland, 2! John Mellor-Crummey, 1 Pavan Balaji 2 1 Department of Computer Science! Rice University! Houston, TX 2 Mathematics and Computer
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationCo-arrays to be included in the Fortran 2008 Standard
Co-arrays to be included in the Fortran 2008 Standard John Reid, ISO Fortran Convener The ISO Fortran Committee has decided to include co-arrays in the next revision of the Standard. Aim of this talk:
More informationDynamic Load Balancing for Weather Models via AMPI
Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu
More informationAn evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks
An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract
More informationParallel Programming Features in the Fortran Standard. Steve Lionel 12/4/2012
Parallel Programming Features in the Fortran Standard Steve Lionel 12/4/2012 Agenda Overview of popular parallelism methodologies FORALL a look back DO CONCURRENT Coarrays Fortran 2015 Q+A 12/5/2012 2
More informationParallel Programming with Coarray Fortran
Parallel Programming with Coarray Fortran SC10 Tutorial, November 15 th 2010 David Henty, Alan Simpson (EPCC) Harvey Richardson, Bill Long, Nathan Wichmann (Cray) Tutorial Overview The Fortran Programming
More informationLeveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E
Executive Summary Leveraging OpenCoarrays to Support Coarray Fortran on IBM Power8E Alessandro Fanfarillo, Damian Rouson Sourcery Inc. www.sourceryinstitue.org We report on the experience of installing
More informationA Coarray Fortran Implementation to Support Data-Intensive Application Development
A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati 1, Alan Richardson 2, Terrence Liao 3, Henri Calandra 3, Barbara Chapman 1 Data-Intensive Scalable Computing
More informationMigrating A Scientific Application from MPI to Coarrays. John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK
Migrating A Scientific Application from MPI to Coarrays John Ashby and John Reid HPCx Consortium Rutherford Appleton Laboratory STFC UK Why and Why Not? +MPI programming is arcane +New emerging paradigms
More informationProgramming Models for Supercomputing in the Era of Multicore
Programming Models for Supercomputing in the Era of Multicore Marc Snir MULTI-CORE CHALLENGES 1 Moore s Law Reinterpreted Number of cores per chip doubles every two years, while clock speed decreases Need
More informationPGAS languages. The facts, the myths and the requirements. Dr Michèle Weiland Monday, 1 October 12
PGAS languages The facts, the myths and the requirements Dr Michèle Weiland m.weiland@epcc.ed.ac.uk What is PGAS? a model, not a language! based on principle of partitioned global address space many different
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationFortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory
Fortran Coarrays John Reid, ISO Fortran Convener, JKR Associates and Rutherford Appleton Laboratory This talk will explain the objectives of coarrays, give a quick summary of their history, describe the
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationHPC Performance Advances for Existing US Navy NWP Systems
HPC Performance Advances for Existing US Navy NWP Systems Timothy Whitcomb, Kevin Viner Naval Research Laboratory Marine Meteorology Division Monterey, CA Matthew Turner DeVine Consulting, Monterey, CA
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationDeutscher Wetterdienst
Accelerating Work at DWD Ulrich Schättler Deutscher Wetterdienst Roadmap Porting operational models: revisited Preparations for enabling practical work at DWD My first steps with the COSMO on a GPU First
More informationDetermining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace
Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more
More informationCS 470 Spring Parallel Languages. Mike Lam, Professor
CS 470 Spring 2017 Mike Lam, Professor Parallel Languages Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/briefoverviewchapel.pdf
More informationEvaluating New Communication Models in the Nek5000 Code for Exascale
Evaluating New Communication Models in the Nek5000 Code for Exascale Ilya Ivanov (KTH), Rui Machado (Fraunhofer), Mirko Rahn (Fraunhofer), Dana Akhmetova (KTH), Erwin Laure (KTH), Jing Gong (KTH), Philipp
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationTowards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH
Towards Exascale Programming Models HPC Summit, Prague Erwin Laure, KTH 1 Exascale Programming Models With the evolution of HPC architecture towards exascale, new approaches for programming these machines
More informationParallel Programming Models. Parallel Programming Models. Threads Model. Implementations 3/24/2014. Shared Memory Model (without threads)
Parallel Programming Models Parallel Programming Models Shared Memory (without threads) Threads Distributed Memory / Message Passing Data Parallel Hybrid Single Program Multiple Data (SPMD) Multiple Program
More informationDeutscher Wetterdienst
Porting Operational Models to Multi- and Many-Core Architectures Ulrich Schättler Deutscher Wetterdienst Oliver Fuhrer MeteoSchweiz Xavier Lapillonne MeteoSchweiz Contents Strong Scalability of the Operational
More informationHPC code modernization with Intel development tools
HPC code modernization with Intel development tools Bayncore, Ltd. Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 17 th 2016, Barcelona Microprocessor
More informationAppendix D. Fortran quick reference
Appendix D Fortran quick reference D.1 Fortran syntax... 315 D.2 Coarrays... 318 D.3 Fortran intrisic functions... D.4 History... 322 323 D.5 Further information... 324 Fortran 1 is the oldest high-level
More informationIntroduction to parallel computing concepts and technics
Introduction to parallel computing concepts and technics Paschalis Korosoglou (support@grid.auth.gr) User and Application Support Unit Scientific Computing Center @ AUTH Overview of Parallel computing
More informationHybrid Coarrays: a PGAS Feature for Many-Core Architectures
Hybrid Coarrays: a PGAS Feature for Many-Core Architectures Valeria CARDELLINI a and Alessandro FANFARILLO a and Salvatore FILIPPONE a and Damian ROUSON b a Dipartimento di Ingegneria Civile e Ingegneria
More informationMPI+X on The Way to Exascale. William Gropp
MPI+X on The Way to Exascale William Gropp http://wgropp.cs.illinois.edu Likely Exascale Architectures (Low Capacity, High Bandwidth) 3D Stacked Memory (High Capacity, Low Bandwidth) Thin Cores / Accelerators
More informationWhatÕs New in the Message-Passing Toolkit
WhatÕs New in the Message-Passing Toolkit Karl Feind, Message-passing Toolkit Engineering Team, SGI ABSTRACT: SGI message-passing software has been enhanced in the past year to support larger Origin 2
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationHANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY
HANDLING LOAD IMBALANCE IN DISTRIBUTED & SHARED MEMORY Presenters: Harshitha Menon, Seonmyeong Bak PPL Group Phil Miller, Sam White, Nitin Bhat, Tom Quinn, Jim Phillips, Laxmikant Kale MOTIVATION INTEGRATED
More informationParallel Programming. Libraries and Implementations
Parallel Programming Libraries and Implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationParallel and High Performance Computing CSE 745
Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationAdvanced Parallel Programming. Is there life beyond MPI?
Advanced Parallel Programming Is there life beyond MPI? Outline MPI vs. High Level Languages Declarative Languages Map Reduce and Hadoop Shared Global Address Space Languages Charm++ ChaNGa ChaNGa on GPUs
More informationNew Programming Paradigms: Partitioned Global Address Space Languages
Raul E. Silvera -- IBM Canada Lab rauls@ca.ibm.com ECMWF Briefing - April 2010 New Programming Paradigms: Partitioned Global Address Space Languages 2009 IBM Corporation Outline Overview of the PGAS programming
More informationThe MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola
The MPI Message-passing Standard Practical use and implementation (I) SPD Course 2/03/2010 Massimo Coppola What is MPI MPI: Message Passing Interface a standard defining a communication library that allows
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationPorting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms
Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms http://www.scidacreview.org/0902/images/esg13.jpg Matthew Norman Jeffrey Larkin Richard Archibald Valentine Anantharaj
More informationMyths and reality of communication/computation overlap in MPI applications
Myths and reality of communication/computation overlap in MPI applications Alessandro Fanfarillo National Center for Atmospheric Research Boulder, Colorado, USA elfanfa@ucar.edu Oct 12th, 2017 (elfanfa@ucar.edu)
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationHARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES. Cliff Woolley, NVIDIA
HARNESSING IRREGULAR PARALLELISM: A CASE STUDY ON UNSTRUCTURED MESHES Cliff Woolley, NVIDIA PREFACE This talk presents a case study of extracting parallelism in the UMT2013 benchmark for 3D unstructured-mesh
More informationThe Case for Collective Pattern Specification
The Case for Collective Pattern Specification Torsten Hoefler, Jeremiah Willcock, ArunChauhan, and Andrew Lumsdaine Advances in Message Passing, Toronto, ON, June 2010 Motivation and Main Theses Message
More informationCS4961 Parallel Programming. Lecture 16: Introduction to Message Passing 11/3/11. Administrative. Mary Hall November 3, 2011.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Administrative Next programming assignment due on Monday, Nov. 7 at midnight Need to define teams and have initial conversation with
More informationCharm++ Workshop 2010
Charm++ Workshop 2010 Eduardo R. Rodrigues Institute of Informatics Federal University of Rio Grande do Sul - Brazil ( visiting scholar at CS-UIUC ) errodrigues@inf.ufrgs.br Supported by Brazilian Ministry
More informationDirective-based Programming for Highly-scalable Nodes
Directive-based Programming for Highly-scalable Nodes Doug Miles Michael Wolfe PGI Compilers & Tools NVIDIA Cray User Group Meeting May 2016 Talk Outline Increasingly Parallel Nodes Exposing Parallelism
More informationBulk Synchronous and SPMD Programming. The Bulk Synchronous Model. CS315B Lecture 2. Bulk Synchronous Model. The Machine. A model
Bulk Synchronous and SPMD Programming The Bulk Synchronous Model CS315B Lecture 2 Prof. Aiken CS 315B Lecture 2 1 Prof. Aiken CS 315B Lecture 2 2 Bulk Synchronous Model The Machine A model An idealized
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationA performance portable implementation of HOMME via the Kokkos programming model
E x c e p t i o n a l s e r v i c e i n t h e n a t i o n a l i n t e re s t A performance portable implementation of HOMME via the Kokkos programming model L.Bertagna, M.Deakin, O.Guba, D.Sunderland,
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationIMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN
IMPLEMENTATION AND EVALUATION OF ADDITIONAL PARALLEL FEATURES IN COARRAY FORTRAN A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of
More informationParallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL
Parallel Programming without MPI Using Coarrays in Fortran SUMMERSCHOOL 2007 2015 August 5, 2015 Ge Baolai SHARCNET Western University Outline What is coarray How to write: Terms, syntax How to compile
More informationOp#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD
Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
More informationProceedings of the GCC Developers Summit
Reprinted from the Proceedings of the GCC Developers Summit June 17th 19th, 2008 Ottawa, Ontario Canada Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering
More informationParallel Programming. Libraries and implementations
Parallel Programming Libraries and implementations Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationAdapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs
Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches
More informationEULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics
Zbigniew P. Piotrowski *,** EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics *Geophysical Turbulence Program, National Center for Atmospheric Research,
More informationIntroduction to Parallel Programming. Tuesday, April 17, 12
Introduction to Parallel Programming 1 Overview Parallel programming allows the user to use multiple cpus concurrently Reasons for parallel execution: shorten execution time by spreading the computational
More informationOverview: The OpenMP Programming Model
Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP
More informationAchieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation
Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College
More informationImplementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP. October 29, 2015 Hidetoshi Iwashita, RIKEN AICS
Implementation and Evaluation of Coarray Fortran Translator Based on OMNI XcalableMP October 29, 2015 Hidetoshi Iwashita, RIKEN AICS Background XMP Contains Coarray Features XcalableMP (XMP) A PGAS language,
More informationSINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM
SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Basic usage of OpenSHMEM 2 Outline Concept and Motivation Remote Read and Write Synchronisation Implementations OpenSHMEM Summary 3 Philosophy of the talks In
More informationOpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4
OpenACC Course Class #1 Q&A Contents OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4 OpenACC/CUDA/OpenMP Q: Is OpenACC an NVIDIA standard or is it accepted
More informationAdvanced Features. SC10 Tutorial, November 15 th Parallel Programming with Coarray Fortran
Advanced Features SC10 Tutorial, November 15 th 2010 Parallel Programming with Coarray Fortran Advanced Features: Overview Execution segments and Synchronisation Non-global Synchronisation Critical Sections
More informationProgramming for High Performance Computing in Modern Fortran. Bill Long, Cray Inc. 17-May-2005
Programming for High Performance Computing in Modern Fortran Bill Long, Cray Inc. 17-May-2005 Concepts in HPC Efficient and structured layout of local data - modules and allocatable arrays Efficient operations
More informationMPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016
MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016 Message passing vs. Shared memory Client Client Client Client send(msg) recv(msg) send(msg) recv(msg) MSG MSG MSG IPC Shared
More informationPROGRAMMING MODEL EXAMPLES
( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various
More informationProgress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core
Progress on GPU Parallelization of the NIM Prototype Numerical Weather Prediction Dynamical Core Tom Henderson NOAA/OAR/ESRL/GSD/ACE Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden,
More informationImplementation of Parallelization
Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization
More informationHigh performance computing and numerical modeling
High performance computing and numerical modeling Volker Springel Plan for my lectures Lecture 1: Collisional and collisionless N-body dynamics Lecture 2: Gravitational force calculation Lecture 3: Basic
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationElementary Parallel Programming with Examples. Reinhold Bader (LRZ) Georg Hager (RRZE)
Elementary Parallel Programming with Examples Reinhold Bader (LRZ) Georg Hager (RRZE) Two Paradigms for Parallel Programming Hardware Designs Distributed Memory M Message Passing explicit programming required
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationA Local-View Array Library for Partitioned Global Address Space C++ Programs
Lawrence Berkeley National Laboratory A Local-View Array Library for Partitioned Global Address Space C++ Programs Amir Kamil, Yili Zheng, and Katherine Yelick Lawrence Berkeley Lab Berkeley, CA, USA June
More informationScalable Software Transactional Memory for Chapel High-Productivity Language
Scalable Software Transactional Memory for Chapel High-Productivity Language Srinivas Sridharan and Peter Kogge, U. Notre Dame Brad Chamberlain, Cray Inc Jeffrey Vetter, Future Technologies Group, ORNL
More informationCPU GPU. Regional Models. Global Models. Bigger Systems More Expensive Facili:es Bigger Power Bills Lower System Reliability
Xbox 360 Successes and Challenges using GPUs for Weather and Climate Models DOE Jaguar Mark GoveM Jacques Middlecoff, Tom Henderson, Jim Rosinski, Craig Tierney CPU Bigger Systems More Expensive Facili:es
More informationExploring XcalableMP. Shun Liang. August 24, 2012
Exploring XcalableMP Shun Liang August 24, 2012 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2012 Abstract This project has implemented synthetic and application
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationPerformance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf
PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland
More informationCCSM Performance with the New Coupler, cpl6
CCSM Performance with the New Coupler, cpl6 Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Research Jay Larson Rob Jacob Everest Ong Argonne National Laboratory Chris Ding Helen He
More informationGPU-Powered WRF in the Cloud for Research and Operational Applications
GPU-Powered WRF in the Cloud for Research and Operational Applications John Manobianco, Chief Scientist Don Berchoff, Chief Technical Officer john@tempoquest.com, don@tempoquest.com 2017 Modeling Research
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationPorting and Tuning WRF Physics Packages on Intel Xeon and Xeon Phi and NVIDIA GPU
Porting and Tuning WRF Physics Packages on Intel Xeon and Xeon Phi and NVIDIA GPU Tom Henderson Thomas.B.Henderson@noaa.gov Mark Govett, James Rosinski, Jacques Middlecoff NOAA Global Systems Division
More informationWorkloads Programmierung Paralleler und Verteilter Systeme (PPV)
Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment
More informationLocality/Affinity Features COMPUTE STORE ANALYZE
Locality/Affinity Features Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations. Forward looking statements may include statements about
More informationCoarrays in the next Fortran Standard
ISO/IEC JTC1/SC22/WG5 N1824 Coarrays in the next Fortran Standard John Reid, JKR Associates, UK April 21, 2010 Abstract Coarrays will be included in the next Fortran Standard, known informally as Fortran
More informationChapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues
Chapter 4: Threads Silberschatz, Galvin and Gagne 2013 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues 4.2 Silberschatz, Galvin
More information