Unified Model Performance on the NEC SX-6

Similar documents
Computer Architectures and Aspects of NWP models

Introduction to PRECIS

A mass-conservative version of the semi- Lagrangian semi-implicit HIRLAM using Lagrangian vertical coordinates

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory

Porting and Optimizing the COSMOS coupled model on Power6

EULAG: high-resolution computational model for research of multi-scale geophysical fluid dynamics

Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction

Parallelization of HIRLAM

Towards Zero Cost I/O:

Development and Testing of a Next Generation Spectral Element Model for the US Navy

Simulation and visualization of simple leapfrog advection scheme. ATMO 558 Term Project Koichi Sakaguchi

Marshall Ward National Computational Infrastructure

HPC Performance Advances for Existing US Navy NWP Systems

Advanced Numerical Methods for Numerical Weather Prediction

Bob Beare University of Exeter Thanks to Adrian Lock, John Thuburn and Bob Plant

SLICE-3D: A three-dimensional conservative semi-lagrangian scheme for transport problems. Mohamed Zerroukat. Met Office, Exeter, UK

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

ATM 298, Spring 2013 Lecture 4 Numerical Methods: Horizontal DiscreDzaDons April 10, Paul A. Ullrich (HH 251)

Advanced Numerical Methods for NWP Models

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Performance Analysis of the PSyKAl Approach for a NEMO-based Benchmark

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

Dynamic Load Balancing for Weather Models via AMPI

Performance Prediction for Parallel Local Weather Forecast Programs

Current Progress of Grid Project in KMA

cdo Data Processing (and Production) Luis Kornblueh, Uwe Schulzweida, Deike Kleberg, Thomas Jahns, Irina Fast

Sensitivity of resolved and parameterized surface drag to changes in resolution and parameterization

Implementation of an integrated efficient parallel multiblock Flow solver

The ECMWF forecast model, quo vadis?

The Icosahedral Nonhydrostatic (ICON) Model

Running the NIM Next-Generation Weather Model on GPUs

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Deutscher Wetterdienst. Ulrich Schättler Deutscher Wetterdienst Research and Development

XIOS and I/O Where are we?

Characteristics of Memory Location wrt Motherboard. CSCI 4717 Computer Architecture. Characteristics of Memory Capacity Addressable Units

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

Introduction to Regional Earth System Model (RegESM)

GungHo! A new dynamical core for the Unified Model

Evaluating the Performance and Energy Efficiency of the COSMO-ART Model System

The Development of Modeling, Simulation and Visualization Services for the Earth Sciences in the National Supercomputing Center of Korea

Timothy Lanfear, NVIDIA HPC

A finite-volume integration method for computing pressure gradient force in general vertical coordinates

Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4

M.Sc. in Computational Science. Fundamentals of Atmospheric Modelling

Comparison of XT3 and XT4 Scalability

Deutscher Wetterdienst

Lecture 12: Memory hierarchy & caches

Cluster Network Products

Initial results and performance of the GFDL Cubed-Sphere Model

Impact of Consistent Semi-Lagrangian Trajectory Calculations on Numerical Weather Prediction Performance

Document NWPSAF_MO_VS_051 Version /7/15. MFASIS - a fast radiative transfer method for the visible spectrum Leonhard Scheck.

Peter Crouch. Chairman Fortran Specialist Group. BCS Birmingham Branch meeting 19 May 2008

GPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

Cluster Computing. Cluster Architectures

Point-to-Point Synchronisation on Shared Memory Architectures

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

CS399 New Beginnings. Jonathan Walpole

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver

Interpolation. Introduction and basic concepts. Computer User Training Course Paul Dando. User Support Section.

Victor Alessandrini IDRIS - CNRS

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES. Basic usage of OpenSHMEM

Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy!

A Spectral Element Eulerian-Lagrangian Atmospheric Model (SEELAM)

Challenges in Accelerating Operational Weather and Climate Codes

HPC parallelization of oceanographic models via high-level techniques

Porting and Optimisation of UM on ARCHER. Karthee Sivalingam, NCAS-CMS. HPC Workshop ECMWF JWCRP

HYSPLIT model description and operational set up for benchmark case study

Global Numerical Weather Predictions and the semi-lagrangian semi-implicit dynamical 1 / 27. ECMWF forecast model

Comparing One-Sided Communication with MPI, UPC and SHMEM

OPTIMIZATION OF THE CODE OF THE NUMERICAL MAGNETOSHEATH-MAGNETOSPHERE MODEL

A 3-D Finite-Volume Nonhydrostatic Icosahedral Model (NIM) Jin Lee

Observational DataBase (ODB*) and its usage at ECMWF

An explicit and conservative remapping strategy for semi-lagrangian advection

HPC Methods for Coupling Spectral Cloud Microphysics with the COSMO Model

Scaling the Unscalable: A Case Study on the AlphaServer SC

Introduction to ECMWF resources:

Data partitioning and MPI adjoints. Pavanakumar Mohanamuraly Jens D. Mueller

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop

LAPI on HPS Evaluating Federation

1. Creates the illusion of an address space much larger than the physical memory

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

Semi-Lagrangian Advection. The Basic Idea. x 1.

Experience in creating orography ancillary files

A Test Suite for GCMs: An Intercomparison of 11 Dynamical Cores

GPU Implementation of Elliptic Solvers in NWP. Numerical Weather- and Climate- Prediction

Chapter 1: Introduction to Parallel Computing

INTRODUCTION TO COMPUTERS

Network Bandwidth & Minimum Efficient Problem Size

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Switch Jitter. John Hague IBM consultant Nov/08

Pre-announcement of upcoming procurement, NWP18, at National Supercomputing Centre at Linköping University

Unit 9 : Fundamentals of Parallel Processing

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

In-situ data analytics for highly scalable cloud modelling on Cray machines

Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model

Transcription:

Unified Model Performance on the NEC SX-6 Paul Selwood Crown copyright 2004 Page 1

Introduction

The Met Office National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational and Research activities Relocated to Exeter from Bracknell 2003/4 150 th Anniversary in 2004 Crown copyright 2004 Page 3

Supercomputers 1991-1996 : Cray Y-MP/C90 1996-2004 : Cray T3E 2003 : NEC SX-6 Operational May 2004 Currently 2 x 15 node SX-6 systems May 2005 - additional 15 node SX-8 Crown copyright 2004 Page 4

Unified Model Single code used for NWP forecast and climate prediction N216L38 Global (40km in 2005) 20km European (12 km in 2005) 12km UK model (4km in 2005) Submodels (atmosphere, ocean ) Grid-point model (regular lat-long) non-hydrostatic, semi-implicit, Semi-Lagrangian dynamics Arakawa C-grid, Charney-Philips vertical staggering Crown copyright 2004 Page 5

I/O

I/O Route to disk depends on packet size < 64KB nfs (slow) > 64KB GFS (fast) Frequency 14000 12000 10000 8000 6000 System buffering only available for Fortran I/O Unified Model uses C. 4000 2000 0 0 1 2 4 8 16 32 64 128 256 512 1024 More KBytes Crown copyright 2004 Page 7

IO rates and improvements Solution User buffering of output data. STASH version Time (s) Rate (MB/s) Removal of unnecessary opens and closes. Original 212 40 Subsequent buffering of headers increase I/O rate to > 140 MB/s. Buffered 114 80 Considering use of locally attached disk in future (> 1GB/s seen). Removed Open/Close 69 110 Crown copyright 2004 Page 8

Semi-Lagrangian Advection

Load Balancing Advection Semi-Lagrangian advection demonstrating load balance problems Routine Theta departure points Wind departure points Max (s) 29.67 57.55 Min (s) 18.46 35.84 N216L38, 2 day forecast Crown copyright 2004 Page 10

Interpolation of Departure points Need to calculate departure points for variables held at, u and v points. Currently call the departure point routine 3 times, once for each variable type. The departure point routine is expensive and poorly load balanced as extra calculations are done for latitudes >80. Can improve runtime and load balance by calculating the departure point for a point and then interpolating to the u and v points. Approximation only works for higher resolutions. Crown copyright 2004 Page 11

Interpolation of Departure Points Much simpler algorithm is generally cheaper Perfect load balance for wind calculations Load balance remains a (hard) problem for theta point calculations Routine Max (s) Min (s) Wind departure points (original) Wind departure points (new) 57.55 14.25 35.84 14.23 Crown copyright 2004 Page 12

Convection

Load Balancing - Convection Convection load balance is an old problem which we had a solution for on the T3E. Needed reworking as coded directly in SHMEM Segment based dynamic load balancing Having enough segments to give good balance would harm vector performance Data in segments could be sparse data moved that wasn t used Moved too much data approach wouldn t be appropriate with SX-6 compute/communicate ratio. Crown copyright 2004 Page 14

Convection Deep and Shallow Crown copyright 2004 Page 15

New Load Balancing Convection Load balance separately for deep/shallow Calculations per point assumed to be equal Tuneable threshold for data sizes to move Compress data to active points Communications use one-sided MPI Crown copyright 2004 Page 16

Shallow and Deep Convection Load Balance Time (s) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 CPU Original Balanced Time (s) 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 6 CPU Original Balanced 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Crown copyright 2004 Page 17

Short Wave Radiation

Short Wave Radiation incoming flux (January) Crown copyright 2004 Page 19

Load Balancing Short Wave Radiation 45 Similar approach to convection Time (s) 40 35 30 25 20 15 10 5 0 0 1 2 3 4 5 6 Short Wave flux calculations take 90% of time so only this is balanced Computation is relatively more expensive than for convection CPUs Crown copyright 2004 Page 20

Communications

Communications One sided MPI communications have advantages on the SX-6 No need to explicitly schedule communications (MPI_Get from the under-loaded CPU) Speed! - Example is processor pairs exchanging 1000 words Method MPI_Get (global memory) Buffered Send/Recv Individual Send/Recv Buffered MPI_Get (global memory) Time on 2 Nodes (16 CPU) 32.3 msec 6.2 msec 21.7 msec 5.8 msec Crown copyright 2004 Page 22

Communications ideas and opportunities Many unnecessary barriers in code T3E relics! Easily removed for immediate benefit. Gathering/Scattering 3D fields level-by-level Optimised by copying into temporary buffers and doing one communication per CPU pair Halves cost of these communications >6000 halo exchanges in 6 hour forecast Can we amalgamate any? Can we remove any? Crown copyright 2004 Page 23

Questions?