CCSM Performance with the New Coupler, cpl6

Size: px
Start display at page:

Download "CCSM Performance with the New Coupler, cpl6"

Transcription

1 CCSM Performance with the New Coupler, cpl6 Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Research Jay Larson Rob Jacob Everest Ong Argonne National Laboratory Chris Ding Helen He Lawrence Berkeley National Laboratory RIST Workshop, March 3-5, 2003, INGV, Rome, Italy

2 Topics CCSM overview cpl5 review cpl6 goals cpl6 design and datatypes cpl6 performance merging mapping communication Summary

3 CCSM Overview CCSM = Community Climate System Model (NCAR) Designed to evaluate and understand earth s global climate, both historical and future. Multiple executables (5) Atmosphere (CAM), MPI/OpenMP Ocean (POP), MPI Land (CLM2), MPI/OpenMP Sea Ice (CSIM4), MPI Coupler (CPL5), OpenMP

4 CCSM2 Hub and Spoke System ocn atm cpl ice lnd Each component is a separate executable Each component on a unique set of hardware processors All communications go through coupler Coupler communicates with all components maps (interpolates) data merges fields computes some fluxes has diagnostic, history, and restart capability

5 CCSM Platforms Currently support IBM Power3, Power4 SGI Origin O2k, O3K Nearly support HP/Compaq Future? Linux Vector Platform (Cray X1, NEC/Earth Simulator)

6 CCSM Resolution and Timing T42 resolution atm and land, 26 vertical levels in atm (128x64x26, 200k cells) 1 degree resolution ocean and ice, 40 vertical levels in ocean (320x384x40, 4M cells) On 100 processors of IBM power4 (8 processors/node, 1.3 Ghz clock, Colony): Model runs about 10 simulated years / day Requires about a month to run 300 years Science requirements set coupling frequency between models and data flow

7 CCSM Overview (part 2) F90 primarily Netcdf history files Binary restart files SCIDAC - DOE ESMF - NASA

8 cpl5 Shortcomings cpl5 is a shared memory application and uses Open_MP threading to achieve parallel computation efficiency cpl5 communication with models is not parallelized, does root-to-root communication only, requires gathers and scatters on distributed memory components Increasing resolutions or coupling frequencies could result in coupling performance bottlenecks cpl5 coupling is hard-wired to MPI and is not easily extensible as currently implemented

9 cpl6 Goals Create a fully parallel distributed memory coupler Implement M to N communication between components Improve communication performance to eliminate any potential future bottlenecks as a result of increased resolution Improve coupling interfaces, abstract communication method away from components Improve usability, flexibility, and extensibility of coupled system Improve overall performance

10 The Solution Build a new coupler framework with abstracted, parallel communication software in the foundation. Create a coupler application instantiation called cpl6 which reproduces the functionality of cpl5: cpl6 MCT* MPH** *Model Coupling Toolkit ** Multi-Component Handshaking Library

11 MCT: Model Coupling Toolkit www-unix.mcs.anl.gov/acpi/mct Major Attributes: Maintains model decomposition descriptors (e.g., global to local indexing) Inter- and intra- component communications and parallel data transfer (routing, rearranging) Flexible, extensible, indexible field storage Time averaging and accumulation Regridding (via sparse matrix vector multiply) MCT eases the construction of coupler computational cores and component-coupler interfaces.

12 MPH: Multi-Component Handshaking Library General Features: Built on MPI Establishes MPI communicator for each component Performs component name registration Allows resource allocation for each component Supports different execution modes MPH allows generalized communicator assignment, simplifying the component model communication and inter-component communication setup process.

13 cpl6 Architecture main program Layer 1a MCT wrapper control maindata msg map flux restart history diag Layer 1b coupling interface Layer 1c calendar utilities csmshare datatypes MCT derived objects MCT base objects MPEU utilities Layers 2-5 Vendor utilities

14 MCT Data Types Attribute Vector Fundamental data storage type 2d integer and real arrays (field,grid point) Strings for field names Global Seg Map Decomposition information Router M to N communication information Rearranger Local Communication information Smat Scattered mapping matrix data

15 cpl6 Data Types Infobuffer Vector of integers and reals, scalar data Domain cpl6 grid data type Name, Attribute Vector of grid data, GSMap Bundle Fundamental cpl6 storage data type, array data Name, Domain, Attribute Vector, Counter Contract Map Bundle, Infobuffer, Router Name, Smat, Domains, Rearranger

16 cpl6 Modules cpl_fields_mod Shared module, used by all components Sets field numbers and names Differentiates states and fluxes Naming convention allows automatic routing of data between components for simple fields cpl_interface_mod Simple interfaces, simple arguments Components only need to define contract data types. Within the interface, domains, routers, bundles, and contracts will be initialized on the component processors and used Components don t know about MCT, cpl6 data types, or the underlying communication method Extensible cpl6 design is more flexible and extensible than cpl5

17 cpl6 Design: Another view of CCSM hardware processors atm lnd ice ocn cpl coupling interface layer In cpl5, MPI was the coupling interface In cpl6, the coupler is now attached to each component Components unaware of coupling method Coupling work can be carried out on component processors Separate coupler no longer absolutely required

18 CCSM Performance: cpl5 vs cpl6 Merging Trivially parallel operation, cache usage important Production configuration timings Mapping Benchmarks tests for a2o, o2a mapping Effect of bundling multiple interpolations Comparison of mapping in production configuration Communication Focus on coupler to ice communication, high frequency, high resolution Unit tests and production configuration

19 Merging: cpl5 vs cpl6 Ocean Grid (122,880 points) 240 merging calls, 16 fields Production configuration secs cpl6 cpl number of pes

20 Merging Discussion Some serial performance optimization has been carried out in cpl6 cpl6 performs better than cpl5 for merging cpl6 scaling is not limited to a shared memory node cpl6 scaling is acceptable for trivially parallel operations Eliminated Open_MP overhead compared to cpl5 cpl6 will perform better than cpl5 and on a larger number of processors for simple parallel operations.

21 Mapping: cpl5 vs. cpl6 cpl5 mapping is a shared memory operation cpl6 mapping is currently distributed memory parallel only, and allows distributed, parallel mapping across several compute nodes cpl6 mapping requires MPI communication of data Rearrange data decomposition on source grid, then map to destination grid, OR. Map to destination grid, then rearrange to destination decomposition

22 secs Mapping: ocn -> atm Ocn (122,880 points) -> Atm (8192 points) 120 mapping calls, 9 fields number of pes cpl6 cpl5

23 Mapping: atm -> ocn effect of bundling mapping fields secs/field field 8 fields 16 fields field fields fields number of pes

24 secs Mapping: cpl5 vs cpl number of pes 10 simulated days Production configuration IBM-power4, 8 way nodes cpl6 cpl5

25 Mapping Discussion Mapping with cpl6 outperforms cpl5 at the same processor count in unit tests for a2o and o2a Bundling of fields for input into the mapping function is a clear winner over mapping single fields. Mapping performance is highly dependent upon (not shown): the grid sizes and shapes the data decomposition used the load imbalance created by the mapping procedure cpl6 mapping is slower or the same speed as cpl5 in comparisons on a production configuration. The cpl6 mapping method was originally designed to be easy to use, flexible, and extensible. As a result, there are special intermediate bundles and domains, extra array copies, associated rearranging, and array operations that do not use cache efficiently. We expect to improve mapping performance in the near future. cpl6 allows efficient, parallel, scalable mapping which can outperform cpl5 and provide flexibility for future CCSM configurations.

26 CCSM Communication: cpl5 vs cpl6 Coupler on 8pes Ice component on 16pes 240 transfers, 21 fields Production configuration cpl5 cpl6 copy=2.5s copy=1.3s copy=0.0s copy=0.0s comm=7.5s comm=5.0s comm=9.0s comm=8.0s gather =9.0s scatter =36.2s copy=1.0s copy=0.5s cpl5 communication=61.5s cpl6 communication=18.5s

27 cpl5 cpl6 Communication: cpl5 vs. cpl6 Coupler pes 1 (4) Ice pes 1 (16) cpl -> ice (secs) ice fields of size 122, transfers ice -> cpl (secs) apples and apples

28 Communication Discussion The cpl5 numbers illustrate the current CCSM2 performance on a single node using 4 pes and the ice model using 16 pes (4/16 configuration). Note that the cpl6<->ice single processor configuration simulates the root-to-root communication of cpl5<->ice in CCSM2.0. cpl6 is slower (1/1 configuration) due to the overhead created in pushing data thru the MCT. When cpl6 utilizes the full parallel capability of the 4/16 configuration (apples and apples), it clearly outperforms cpl5. cpl6 scales to multiple numbers of pes. cpl6 will be able to run on more pes than cpl5, will allow larger configurations of CCSM, and will improve communication performance.

29 Summary cpl6 is a distributed memory application, no threading implemented currently does M to N communication performance: Generally faster and better scaling than cpl5 Communication significantly faster than cpl5, potential important bottleneck eliminated Mapping in cpl6 (not MCT) requires further optimization is more flexible, usable and extensible scientific validation nearly complete, release expected soon

CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL

CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL Anthony P. Craig 1 Robert Jacob 2 Brian Kauffman 1 Tom Bettge 3 Jay Larson 2 Everest Ong 2 Chris Ding

More information

Improving climate model coupling through complete mesh representation

Improving climate model coupling through complete mesh representation Improving climate model coupling through complete mesh representation Robert Jacob, Iulian Grindeanu, Vijay Mahadevan, Jason Sarich July 12, 2018 3 rd Workshop on Physics Dynamics Coupling Support: U.S.

More information

NCAR CCSM with Task Geometry Support in LSF

NCAR CCSM with Task Geometry Support in LSF NCAR CCSM with Task Geometry Support in LSF Mike Page ScicomP 11 Conference Edinburgh, Scotland June 1, 2005 NCAR/CISL/SCD Consulting Services Group mpage@ucar.edu Overview Description of CCSM Task Geometry

More information

Comparing Linux Clusters for the Community Climate System Model

Comparing Linux Clusters for the Community Climate System Model Comparing Linux Clusters for the Community Climate System Model Matthew Woitaszek, Michael Oberg, and Henry M. Tufo Department of Computer Science University of Colorado, Boulder {matthew.woitaszek, michael.oberg}@colorado.edu,

More information

Adding MOAB to CIME s MCT driver

Adding MOAB to CIME s MCT driver Adding MOAB to CIME s MCT driver Robert Jacob, Iulian Grindeanu, Vijay Mahadevan, Jason Sarich CESM SEWG winter meeting February 27, 2018 Support: DOE BER Climate Model Development and Validation project

More information

Multilingual Interfaces for Parallel Coupling in Multiphysics and Multiscale Systems

Multilingual Interfaces for Parallel Coupling in Multiphysics and Multiscale Systems Multilingual Interfaces for Parallel Coupling in Multiphysics and Multiscale Systems Everest T. Ong 1, J. Walter Larson 23, Boyana Norris 2, Robert L. Jacob 2, Michael Tobis 4, and Michael Steder 4 1 Department

More information

NCAR CCSM with Task-Geometry Support in LSF

NCAR CCSM with Task-Geometry Support in LSF NCAR CCSM with Task-Geometry Support in LSF 1. Overview Mike Page and George Carr, Jr. National Center for Atmospheric Research, P.O. Box 3000, Boulder, CO 80307-3000, USA E-mail: mpage, gcarr@ucar.edu

More information

RegCM-ROMS Tutorial: Coupling RegCM-ROMS

RegCM-ROMS Tutorial: Coupling RegCM-ROMS RegCM-ROMS Tutorial: Coupling RegCM-ROMS Ufuk Utku Turuncoglu ICTP (International Center for Theoretical Physics) Earth System Physics Section - Outline Outline Information about coupling and ESMF Installation

More information

CESM2 Software Update. Mariana Vertenstein CESM Software Engineering Group

CESM2 Software Update. Mariana Vertenstein CESM Software Engineering Group CESM2 Software Update Mariana Vertenstein CESM Software Engineering Group Outline CMIP6 Computational Performance Cheyenne Status CESM2 new user-friendly infrastructure features CIME New porting capabilities

More information

FMS: the Flexible Modeling System

FMS: the Flexible Modeling System FMS: the Flexible Modeling System Coupling Technologies for Earth System Modeling Toulouse FRANCE V. Balaji balaji@princeton.edu Princeton University 15 December 2010 Balaji (Princeton University) Flexible

More information

A simple OASIS interface for CESM E. Maisonnave TR/CMGC/11/63

A simple OASIS interface for CESM E. Maisonnave TR/CMGC/11/63 A simple OASIS interface for CESM E. Maisonnave TR/CMGC/11/63 Index Strategy... 4 Implementation... 6 Advantages... 6 Current limitations... 7 Annex 1: OASIS3 interface implementation on CESM... 9 Annex

More information

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square

More information

Common Infrastructure for Modeling Earth (CIME) and MOM6. Mariana Vertenstein CESM Software Engineering Group

Common Infrastructure for Modeling Earth (CIME) and MOM6. Mariana Vertenstein CESM Software Engineering Group Common Infrastructure for Modeling Earth (CIME) and MOM6 Mariana Vertenstein CESM Software Engineering Group Outline What is CIME? New CIME coupling infrastructure and MOM6 CESM2/DART Data Assimilation

More information

Extending scalability of the community atmosphere model

Extending scalability of the community atmosphere model Journal of Physics: Conference Series Extending scalability of the community atmosphere model To cite this article: A Mirin and P Worley 2007 J. Phys.: Conf. Ser. 78 012082 Recent citations - Evaluation

More information

Introduction to Regional Earth System Model (RegESM)

Introduction to Regional Earth System Model (RegESM) Introduction to Regional Earth System Model (RegESM) Ufuk Turuncoglu Istanbul Technical University Informatics Institute 14/05/2014, 7th ICTP Workshop on the Theory and Use of Regional Climate Models Outline

More information

ESMF. Earth System Modeling Framework. Carsten Lemmen. Schnakenbek, 17 Sep /23

ESMF. Earth System Modeling Framework. Carsten Lemmen. Schnakenbek, 17 Sep /23 1/23 ESMF Earth System Modeling Framework Carsten Lemmen Schnakenbek, 17 Sep 2013 2/23 Why couple? GEOS5 vorticity We live in a coupled world combine, extend existing models (domains + processes) reuse

More information

Dr. John Dennis

Dr. John Dennis Dr. John Dennis dennis@ucar.edu June 23, 2011 1 High-resolution climate generates a large amount of data! June 23, 2011 2 PIO update and Lustre optimizations How do we analyze high-resolution climate data

More information

A Draft Specification for Parallel Coupling Infrastructure (PCI)

A Draft Specification for Parallel Coupling Infrastructure (PCI) A Draft Specification for Parallel Coupling Infrastructure (PCI) J. Walter Larson and Boyana Norris Math + Computer Science Division Argonne National Laboratory Overview Parallel coupling problem in brief

More information

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 A Software Developing Environment for Earth System Modeling Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 1 Outline Motivation Purpose and Significance Research Contents Technology

More information

C-Coupler2: a flexible and user-friendly community coupler for model coupling and nesting

C-Coupler2: a flexible and user-friendly community coupler for model coupling and nesting https://doi.org/10.5194/gmd-11-3557-2018 Author(s) 2018. This work is distributed under the Creative Commons Attribution 4.0 License. C-Coupler2: a flexible and user-friendly community coupler for model

More information

CESM1 for Deep Time Paleoclimate

CESM1 for Deep Time Paleoclimate CESM1 for Deep Time Paleoclimate Christine A. Shields NCAR Thanks to Mariana Vertenstein, Nancy Norton, Gokhan Danabasoglu, Brian Kauffman, Erik Kluzek, Sam Levis, and Nan Rosenbloom NCAR is sponsored

More information

GPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem

GPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem GPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem Presenter: Jian Sun Advisor: Joshua S. Fu Collaborator: John B. Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates,

More information

Getting up and running with CESM Cécile Hannay Climate and Global Dynamics (CGD), NCAR

Getting up and running with CESM Cécile Hannay Climate and Global Dynamics (CGD), NCAR Getting up and running with CESM Cécile Hannay Climate and Global Dynamics (CGD), NCAR NCAR is sponsored by the National Science Foundation Why CESM? State of the Art Climate Model Widely used by the Climate

More information

CESM Tutorial. NCAR Climate and Global Dynamics Laboratory. CESM 2.0 CESM1.2.x and previous (see earlier tutorials) Alice Bertini

CESM Tutorial. NCAR Climate and Global Dynamics Laboratory. CESM 2.0 CESM1.2.x and previous (see earlier tutorials) Alice Bertini CESM Tutorial NCAR Climate and Global Dynamics Laboratory CESM 2.0 CESM1.2.x and previous (see earlier tutorials) Alice Bertini NCAR is sponsored by the National Science Foundation Outline The CESM webpage

More information

ScalaIOTrace: Scalable I/O Tracing and Analysis

ScalaIOTrace: Scalable I/O Tracing and Analysis ScalaIOTrace: Scalable I/O Tracing and Analysis Karthik Vijayakumar 1, Frank Mueller 1, Xiaosong Ma 1,2, Philip C. Roth 2 1 Department of Computer Science, NCSU 2 Computer Science and Mathematics Division,

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

16 th Annual CESM Workshop s Software Engineering Working Group. Parallel Analysis of GeOscience Data Status and Future

16 th Annual CESM Workshop s Software Engineering Working Group. Parallel Analysis of GeOscience Data Status and Future 16 th Annual CESM Workshop s Software Engineering Working Group Parallel Analysis of GeOscience Data Status and Future Jeff Daily PI: Karen Schuchardt, in collaboration with Colorado State University s

More information

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners 24th Forum ORAP Cite Scientifique; Lille, France March 26, 2009 Don Middleton National

More information

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 16th Layered Ocean Model Workshop

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 16th Layered Ocean Model Workshop New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 16th Layered Ocean Model Workshop May 23, 2013 Mass Conservation - I Mass conservation is important for climate studies It is a powerfull

More information

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more

More information

The DOE Parallel Climate Model (PCM): The Computational Highway and Backroads

The DOE Parallel Climate Model (PCM): The Computational Highway and Backroads The DOE Parallel Climate Model (PCM): The Computational Highway and Backroads Thomas Bettge, Anthony Craig, Rodney James, Vincent Wayland, and Gary Strand National Center for Atmospheric Research, 1850

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

HPC Performance Advances for Existing US Navy NWP Systems

HPC Performance Advances for Existing US Navy NWP Systems HPC Performance Advances for Existing US Navy NWP Systems Timothy Whitcomb, Kevin Viner Naval Research Laboratory Marine Meteorology Division Monterey, CA Matthew Turner DeVine Consulting, Monterey, CA

More information

FISOC: Framework for Ice Sheet Ocean Coupling

FISOC: Framework for Ice Sheet Ocean Coupling Rupert Gladstone, Ben Galton-Fenzi, David Gwyther, Lenneke Jong Contents Third party coupling software: Earth System Modelling Framework (ESMF). FISOC overview: aims and design ethos. FISOC overview: code

More information

The Earth System Modeling Framework (and Beyond)

The Earth System Modeling Framework (and Beyond) The Earth System Modeling Framework (and Beyond) Fei Liu NOAA Environmental Software Infrastructure and Interoperability http://www.esrl.noaa.gov/nesii/ March 27, 2013 GEOSS Community ESMF is an established

More information

Environmental Modelling: Crossing Scales and Domains. Bert Jagers

Environmental Modelling: Crossing Scales and Domains. Bert Jagers Environmental Modelling: Crossing Scales and Domains Bert Jagers 3 rd Workshop on Coupling Technologies for Earth System Models Manchester, April 20-22, 2015 https://www.earthsystemcog.org/projects/cw2015

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike Folk, Leon Arber The HDF Group Champaign, IL 61820

More information

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf

Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture. Alexander Berreth. Markus Bühler, Benedikt Anlauf PADC Anual Workshop 20 Performance of the 3D-Combustion Simulation Code RECOM-AIOLOS on IBM POWER8 Architecture Alexander Berreth RECOM Services GmbH, Stuttgart Markus Bühler, Benedikt Anlauf IBM Deutschland

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Experiences with Porting CESM to ARCHER

Experiences with Porting CESM to ARCHER Experiences with Porting CESM to ARCHER ARCHER Technical Forum Webinar, 25th February, 2015 Gavin J. Pringle 25 February 2015 ARCHER Technical Forum Webinar Overview of talk Overview of the associated

More information

Fast Methods with Sieve

Fast Methods with Sieve Fast Methods with Sieve Matthew G Knepley Mathematics and Computer Science Division Argonne National Laboratory August 12, 2008 Workshop on Scientific Computing Simula Research, Oslo, Norway M. Knepley

More information

How to overcome common performance problems in legacy climate models

How to overcome common performance problems in legacy climate models How to overcome common performance problems in legacy climate models Jörg Behrens 1 Contributing: Moritz Handke 1, Ha Ho 2, Thomas Jahns 1, Mathias Pütz 3 1 Deutsches Klimarechenzentrum (DKRZ) 2 Helmholtz-Zentrum

More information

Performance Tools (Paraver/Dimemas)

Performance Tools (Paraver/Dimemas) www.bsc.es Performance Tools (Paraver/Dimemas) Jesús Labarta, Judit Gimenez BSC Enes workshop on exascale techs. Hamburg, March 18 th 2014 Our Tools! Since 1991! Based on traces! Open Source http://www.bsc.es/paraver!

More information

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition Yun He and Chris H.Q. Ding NERSC Division, Lawrence Berkeley National

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

OASIS3-MCT, a coupler for climate modelling

OASIS3-MCT, a coupler for climate modelling OASIS3-MCT, a coupler for climate modelling S. Valcke, CERFACS OASIS historical overview OASIS3-MCT: Application Programming Interface Parallel Decompositions supported Communication Interpolations et

More information

HPC parallelization of oceanographic models via high-level techniques

HPC parallelization of oceanographic models via high-level techniques HPC parallelization of oceanographic models via high-level techniques Piero Lanucara, Vittorio Ruggiero CASPUR Vincenzo Artale, Andrea Bargagli, Adriana Carillo, Gianmaria Sannino ENEA Casaccia Roma, Italy

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks. Boris Bierbaum, Thomas Bemmerl

Implementation and Analysis of Nonblocking Collective Operations on SCI Networks. Boris Bierbaum, Thomas Bemmerl Implementation and Analysis of Nonblocking Collective Operations on SCI Networks Christian Kaiser Torsten Hoefler Boris Bierbaum, Thomas Bemmerl Scalable Coherent Interface (SCI) Ringlet: IEEE Std 1596-1992

More information

Comparison of XT3 and XT4 Scalability

Comparison of XT3 and XT4 Scalability Comparison of XT3 and XT4 Scalability Patrick H. Worley Oak Ridge National Laboratory CUG 2007 May 7-10, 2007 Red Lion Hotel Seattle, WA Acknowledgements Research sponsored by the Climate Change Research

More information

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing

Designing Parallel Programs. This review was developed from Introduction to Parallel Computing Designing Parallel Programs This review was developed from Introduction to Parallel Computing Author: Blaise Barney, Lawrence Livermore National Laboratory references: https://computing.llnl.gov/tutorials/parallel_comp/#whatis

More information

Porting and Optimizing the COSMOS coupled model on Power6

Porting and Optimizing the COSMOS coupled model on Power6 Porting and Optimizing the COSMOS coupled model on Power6 Luis Kornblueh Max Planck Institute for Meteorology November 5, 2008 L. Kornblueh, MPIM () echam5 November 5, 2008 1 / 21 Outline 1 Introduction

More information

Kepler Scientific Workflow and Climate Modeling

Kepler Scientific Workflow and Climate Modeling Kepler Scientific Workflow and Climate Modeling Ufuk Turuncoglu Istanbul Technical University Informatics Institute Cecelia DeLuca Sylvia Murphy NOAA/ESRL Computational Science and Engineering Dept. NESII

More information

Optimisation Myths and Facts as Seen in Statistical Physics

Optimisation Myths and Facts as Seen in Statistical Physics Optimisation Myths and Facts as Seen in Statistical Physics Massimo Bernaschi Institute for Applied Computing National Research Council & Computer Science Department University La Sapienza Rome - ITALY

More information

The CIME Case Control System

The CIME Case Control System The CIME Case Control System An Object Oriented Python Data Driven Workflow Control System for Earth System Models Jim Edwards 22 nd Annual Community Earth System Model Workshop Boulder, CO 19-22 June

More information

A comparative study of coupling frameworks: the MOM case study

A comparative study of coupling frameworks: the MOM case study A comparative study of coupling frameworks: the MOM case study V. Balaji Princeton University and NOAA/GFDL Giang Nong and Shep Smithline RSIS Inc. and NOAA/GFDL Rene Redler NEC Europe Ltd ECMWF High Performance

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

Dynamic Load Balancing for Weather Models via AMPI

Dynamic Load Balancing for Weather Models via AMPI Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu

More information

Toward Automated Application Profiling on Cray Systems

Toward Automated Application Profiling on Cray Systems Toward Automated Application Profiling on Cray Systems Charlene Yang, Brian Friesen, Thorsten Kurth, Brandon Cook NERSC at LBNL Samuel Williams CRD at LBNL I have a dream.. M.L.K. Collect performance data:

More information

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother? Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC INTERTWinE workshop Decoupling data computation from processing to support high performance data analytics Nick Brown, EPCC n.brown@epcc.ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation

More information

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC CUG Talk In-situ analytics for highly scalable cloud modelling on Cray machines Nick Brown, EPCC nick.brown@ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation for modelling clouds &

More information

Integrating Analysis and Computation with Trios Services

Integrating Analysis and Computation with Trios Services October 31, 2012 Integrating Analysis and Computation with Trios Services Approved for Public Release: SAND2012-9323P Ron A. Oldfield Scalable System Software Sandia National Laboratories Albuquerque,

More information

Flexible Coupling for Performance. Chris Armstrong Rupert Ford Graham Riley

Flexible Coupling for Performance. Chris Armstrong Rupert Ford Graham Riley Flexible Coupling for Performance Chris Armstrong Rupert Ford Graham Riley Overview Introduction Deployment flexibility (BFG1) Argument Passing (BFG2) GENIE results Conclusions Introduction Flexible Coupling

More information

Parallel Programming with MPI and OpenMP

Parallel Programming with MPI and OpenMP Parallel Programming with MPI and OpenMP Michael J. Quinn Chapter 6 Floyd s Algorithm Chapter Objectives Creating 2-D arrays Thinking about grain size Introducing point-to-point communications Reading

More information

CCSM3.0 User s Guide

CCSM3.0 User s Guide Community Climate System Model National Center for Atmospheric Research, Boulder, CO CCSM3.0 User s Guide Mariana Vertenstein, Tony Craig, Tom Henderson, Sylvia Murphy, George R Carr Jr and Nancy Norton

More information

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop

Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop Center for Scalable Application Development Software (CScADS): Automatic Performance Tuning Workshop http://cscads.rice.edu/ Discussion and Feedback CScADS Autotuning 07 Top Priority Questions for Discussion

More information

Namelist and Code Modifications Part 1: Namelist Modifications Part 2: Code Modifications Part 3: Quiz

Namelist and Code Modifications Part 1: Namelist Modifications Part 2: Code Modifications Part 3: Quiz Namelist and Code Modifications Part 1: Namelist Modifications Part 2: Code Modifications Part 3: Quiz Cecile Hannay, CAM Science Liaison Atmospheric Modeling and Predictability Section Climate and Global

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends Collaborators: Richard T. Mills, Argonne National Laboratory Sarat Sreepathi, Oak Ridge National Laboratory Forrest M. Hoffman, Oak Ridge National Laboratory Jitendra Kumar, Oak Ridge National Laboratory

More information

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

Lecture 27 Programming parallel hardware Suggested reading: (see next slide) Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)" 1" Suggested Readings" Readings" H&P: Chapter 7 especially 7.1-7.8" Introduction to Parallel Computing" https://computing.llnl.gov/tutorials/parallel_comp/"

More information

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters Junaid Nomani and Jakub Szefer Computer Architecture and Security Laboratory Yale University junaid.nomani@yale.edu

More information

Understanding MPI on Cray XC30

Understanding MPI on Cray XC30 Understanding MPI on Cray XC30 MPICH3 and Cray MPT Cray MPI uses MPICH3 distribution from Argonne Provides a good, robust and feature rich MPI Cray provides enhancements on top of this: low level communication

More information

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model Ufuk Utku Turuncoglu ICTP (International Center for Theoretical Physics) Earth System Physics Section - Outline Outline Introduction Grid generation

More information

Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers Abhinav Sarje, Douglas W. Jacobsen, Samuel W. Williams, Todd Ringler, Leonid Oliker Lawrence Berkeley National Laboratory {asarje,swwilliams,loliker}@lbl.gov

More information

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud

Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda

More information

Compute Node Linux: Overview, Progress to Date & Roadmap

Compute Node Linux: Overview, Progress to Date & Roadmap Compute Node Linux: Overview, Progress to Date & Roadmap David Wallace Cray Inc ABSTRACT: : This presentation will provide an overview of Compute Node Linux(CNL) for the CRAY XT machine series. Compute

More information

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA

Particle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran

More information

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 14th Layered Ocean Model Workshop August 22, 2007 HYCOM 2.2 (I) Maintain all features of HYCOM 2.1 Orthogonal curvilinear grids Can emulate

More information

Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy!

Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy! Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy! 2012 Programming weather, climate, and earth-system models

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

The Bespoke Framework Generator (BFG) Rupert Ford Graham Riley

The Bespoke Framework Generator (BFG) Rupert Ford Graham Riley The Bespoke Framework Generator (BFG) Rupert Ford Graham Riley Overview What is BFG? How is it implemented? Current status Example use Future work 2 What is BFG? Bespoke Framework Generator BFG takes as

More information

CESM Projects Using ESMF and NUOPC Conventions

CESM Projects Using ESMF and NUOPC Conventions CESM Projects Using ESMF and NUOPC Conventions Cecelia DeLuca NOAA ESRL/University of Colorado CESM Annual Workshop June 18, 2014 Outline ESMF development update Joint CESM-ESMF projects ESMF applications:

More information

Detection and Analysis of Iterative Behavior in Parallel Applications

Detection and Analysis of Iterative Behavior in Parallel Applications Detection and Analysis of Iterative Behavior in Parallel Applications Karl Fürlinger and Shirley Moore Innovative Computing Laboratory, Department of Electrical Engineering and Computer Science, University

More information

OASIS4 User Guide. PRISM Report No 3

OASIS4 User Guide. PRISM Report No 3 PRISM Project for Integrated Earth System Modelling An Infrastructure Project for Climate Research in Europe funded by the European Commission under Contract EVR1-CT2001-40012 OASIS4 User Guide Edited

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

LS-DYNA Scalability Analysis on Cray Supercomputers

LS-DYNA Scalability Analysis on Cray Supercomputers 13 th International LS-DYNA Users Conference Session: Computing Technology LS-DYNA Scalability Analysis on Cray Supercomputers Ting-Ting Zhu Cray Inc. Jason Wang LSTC Abstract For the automotive industry,

More information

Performance of Multicore LUP Decomposition

Performance of Multicore LUP Decomposition Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations

More information

Building Ensemble-Based Data Assimilation Systems for Coupled Models

Building Ensemble-Based Data Assimilation Systems for Coupled Models Building Ensemble-Based Data Assimilation Systems for Coupled s Lars Nerger Alfred Wegener Institute for Polar and Marine Research Bremerhaven, Germany Overview How to simplify to apply data assimilation?

More information

Introducing a new tool set. Sean Patrick Santos. 19th Annual CESM Workshop, 2014

Introducing a new tool set. Sean Patrick Santos. 19th Annual CESM Workshop, 2014 Unit Testing in CESM Introducing a new tool set Sean Patrick Santos National Center for Atmospheric Research 19th Annual CESM Workshop, 2014 Overview Outline 1 Overview 2 Workflows Running Unit Tests Creating

More information

First Experiences with Application Development with Fortran Damian Rouson

First Experiences with Application Development with Fortran Damian Rouson First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran

More information

b40.lgm21ka.1deg.003

b40.lgm21ka.1deg.003 b40.lgm21ka.1deg.003 Contents: Run Specifications Run Checklist Comments Status: completed Run Specifications =================== General Information =================== Purpose of Run: IPCC TIER1 CCSM4

More information

I1850Clm50SpG is the short name for 1850_DATM%GSWP3v1_CLM50%SP_SICE_SOCN_MOSART_CISM2%EVOLVE_SWAV.

I1850Clm50SpG is the short name for 1850_DATM%GSWP3v1_CLM50%SP_SICE_SOCN_MOSART_CISM2%EVOLVE_SWAV. In this exercise, you will use CESM to compute the surface mass balance of the Greenland ice sheet. You will make a simple code modification to perform a crude global warming or cooling experiment. Create

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures Rolf Rabenseifner rabenseifner@hlrs.de Gerhard Wellein gerhard.wellein@rrze.uni-erlangen.de University of Stuttgart

More information

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002

HDF5 I/O Performance. HDF and HDF-EOS Workshop VI December 5, 2002 HDF5 I/O Performance HDF and HDF-EOS Workshop VI December 5, 2002 1 Goal of this talk Give an overview of the HDF5 Library tuning knobs for sequential and parallel performance 2 Challenging task HDF5 Library

More information