HPC parallelization of oceanographic models via high-level techniques
|
|
- Ambrose Martin
- 6 years ago
- Views:
Transcription
1 HPC parallelization of oceanographic models via high-level techniques Piero Lanucara, Vittorio Ruggiero CASPUR Vincenzo Artale, Andrea Bargagli, Adriana Carillo, Gianmaria Sannino ENEA Casaccia Roma, Italy HPC parallelization of oceanographic models via high-level techniques p.1/27
2 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: HPC parallelization of oceanographic models via high-level techniques p.2/27
3 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models HPC parallelization of oceanographic models via high-level techniques p.2/27
4 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: HPC parallelization of oceanographic models via high-level techniques p.2/27
5 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization HPC parallelization of oceanographic models via high-level techniques p.2/27
6 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization The target is the development of HPC oceanographic models HPC parallelization of oceanographic models via high-level techniques p.2/27
7 Princeton Ocean Model One of the most popular three-dimensional oceanographic models HPC parallelization of oceanographic models via high-level techniques p.3/27
8 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: HPC parallelization of oceanographic models via high-level techniques p.3/27
9 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27
10 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27
11 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.4/27
12 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.5/27
13 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.6/27
14 POM parallelization roadmap Many parallel POM codes available from literature HPC parallelization of oceanographic models via high-level techniques p.7/27
15 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: HPC parallelization of oceanographic models via high-level techniques p.7/27
16 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) HPC parallelization of oceanographic models via high-level techniques p.7/27
17 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) HPC parallelization of oceanographic models via high-level techniques p.7/27
18 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done HPC parallelization of oceanographic models via high-level techniques p.7/27
19 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done This work was the starting point for the parallelization of a different release of POM entirely developed and mantained by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.7/27
20 The EPOM code A modified version of POM developed by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.8/27
21 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) HPC parallelization of oceanographic models via high-level techniques p.8/27
22 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines HPC parallelization of oceanographic models via high-level techniques p.8/27
23 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) HPC parallelization of oceanographic models via high-level techniques p.8/27
24 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) New mixed zeta sigma vertical coordinates (based on paper: Mellor, G. L., S. Hakkinen, T. Ezer and R. Patchen, A generalization of a sigma coordinate ocean model and an intercomparison of model vertical grids, Ocean Forecasting 2002) HPC parallelization of oceanographic models via high-level techniques p.8/27
25 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? HPC parallelization of oceanographic models via high-level techniques p.9/27
26 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: HPC parallelization of oceanographic models via high-level techniques p.9/27
27 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing HPC parallelization of oceanographic models via high-level techniques p.9/27
28 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user HPC parallelization of oceanographic models via high-level techniques p.9/27
29 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user The effort to maintain an MPI program grows proportionally with its complexity and with the number of code rearrangements HPC parallelization of oceanographic models via high-level techniques p.9/27
30 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode HPC parallelization of oceanographic models via high-level techniques p.10/27
31 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI HPC parallelization of oceanographic models via high-level techniques p.10/27
32 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit HPC parallelization of oceanographic models via high-level techniques p.10/27
33 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit SMS is entirely developed and maintained by the Advancing Computing Branch of Aviation Division of NOAA We thank SMS Staff for their support! HPC parallelization of oceanographic models via high-level techniques p.10/27
34 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.11/27
35 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.12/27
36 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.13/27
37 The advq routine of SMS EPOM subroutine advq(qb,q,dti2,qf) use memory implicit none integer :: i,j,k,it real :: dti2 CSMS$DISTRIBUTE(pom_dec,1,2) BEGIN real :: qb(im,jm,kb),q(im,jm,kb),qf(im,jm,kb), $ xflux(im,jm,kb),yflux(im,jm,kb) CSMS$DISTRIBUTE END CSMS$PARALLEL(pom_dec,<i>,<j>) BEGIN xflux(:,:,:)=0.e0 yflux(:,:,:)=0.e0 CSMS$CHECK_HALO(q<1:0,1:0>, halo of q in advq ) do k=2,kbm1 do j=2,jm do i=2,im HPC parallelization of oceanographic models via high-level techniques p.14/27
38 The advq routine of SMS EPOM xflux(i,j,k)=.25e0*(q(i,j,k)+q(i-1,j,k))* $ (u(i,j,k)+u(i,j,k-1)) yflux(i,j,k)=.25e0*(q(i,j,k)+q(i,j-1,k))* $ (v(i,j,k)+v(i,j,k-1)) enddo enddo enddo do k=2,kbm1 do j=2,jm do i=2,im xflux(i,j,k)=xflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i-1,j,k)+ $ aam(i,j,k-1)+aam(i-1,j,k-1)) $ *(qb(i,j,k)-qb(i-1,j,k))*dum(i,j) $ /(dx(i,j)+dx(i-1,j)) HPC parallelization of oceanographic models via high-level techniques p.15/27
39 The advq routine of SMS EPOM yflux(i,j,k)=yflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i,j-1,k)+ $ aam(i,j,k-1)+aam(i,j-1,k-1)) $ *(qb(i,j,k)-qb(i,j-1,k))*dvm(i,j) $ /(dy(i,j)+dy(i,j-1)) xflux(i,j,k)=.25e0*(dy(i,j)+dy(i-1,j)) $ *(dzz(i,j,k-1)+dzz(i-1,j,k-1))* $ xflux(i,j,k) yflux(i,j,k)=.25e0*(dx(i,j)+dx(i,j-1)) $ *(dzz(i,j,k-1)+dzz(i,j-1,k-1))* $ yflux(i,j,k) enddo enddo enddo CSMS$EXCHANGE(xflux<0:1,0:0>,yflux<0:0,0:1>) HPC parallelization of oceanographic models via high-level techniques p.16/27
40 The advq routine of SMS EPOM do k=2,kbm1 do j=2,jmm1 do i=2,imm1 qf(i,j,k)=.5e0*( w(i,j,k-1)*q(i,j,k-1) $ -w(i,j,k+1)*q(i,j,k+1))*art(i,j) $ +xflux(i+1,j,k)-xflux(i,j,k) $ +yflux(i,j+1,k)-yflux(i,j,k) qf(i,j,k)=((dzb(i,j,k)+dzb(i,j,k-1))*art(i,j)* $ qb(i,j,k) $ -2.*dti2*qf(i,j,k))/((dzf(i,j,k)+ $ dzf(i,j,k-1))*art(i,j)) enddo enddo enddo CSMS$COMPARE_VAR(qf, advq ) CSMS$PARALLEL END return end HPC parallelization of oceanographic models via high-level techniques p.17/27
41 The IBM SP4 at Caspur 1 server 690 HPC parallelization of oceanographic models via high-level techniques p.18/27
42 The IBM SP4 at Caspur 1 server Power4, 1.3 Ghz, 64 GB RAM HPC parallelization of oceanographic models via high-level techniques p.18/27
43 Mediterranean Sea Test Case grid points HPC parallelization of oceanographic models via high-level techniques p.19/27
44 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points HPC parallelization of oceanographic models via high-level techniques p.19/27
45 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points The serial code runs for about 15 seconds on a single Power4 for single time step ( day) of HPC parallelization of oceanographic models via high-level techniques p.19/27
46 Mediterranean Sea Test Case Salinity and Temperature plots up to 10 meters depth: HPC parallelization of oceanographic models via high-level techniques p.20/27
47 Mediterranean Sea Test Case Salinity and Temperature plots up to 250 meters depth: HPC parallelization of oceanographic models via high-level techniques p.21/27
48 Performance results: TERRA Very good performances (the table shows a speed up of almost 21 on 32 cpus) HPC parallelization of oceanographic models via high-level techniques p.22/27
49 Performance results: NOTERRA The performance degradation can be explained with the reduced amount of computation and increasing load imbalance but also with cache misses during the EX- CHANGE step HPC parallelization of oceanographic models via high-level techniques p.23/27
50 Future Developments Coupling with higher resolution SubModels HPC parallelization of oceanographic models via high-level techniques p.24/27
51 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions HPC parallelization of oceanographic models via high-level techniques p.25/27
52 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) HPC parallelization of oceanographic models via high-level techniques p.25/27
53 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) Coupling the Mediterranean Model (resolution about 10 Km) with Sub Area Models (resolution about 1 Km) is an efficient way to combine reasonable computation times with the necessity of higher resolution in some areas HPC parallelization of oceanographic models via high-level techniques p.25/27
54 Ligurian sea Coupling example Three indipendent interacting processes. HPC parallelization of oceanographic models via high-level techniques p.26/27
55 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region HPC parallelization of oceanographic models via high-level techniques p.26/27
56 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region HPC parallelization of oceanographic models via high-level techniques p.26/27
57 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region Process D: barotropic circulation in the Entire area at the coarse resolution and driver of all the communications HPC parallelization of oceanographic models via high-level techniques p.26/27
58 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) HPC parallelization of oceanographic models via high-level techniques p.27/27
59 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) HPC parallelization of oceanographic models via high-level techniques p.27/27
60 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. HPC parallelization of oceanographic models via high-level techniques p.27/27
61 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. We plan to use Model Coupling Toolkit from SciDAC project for an efficient parallel coupling HPC parallelization of oceanographic models via high-level techniques p.27/27
High Performance Ocean Modeling using CUDA
using CUDA Chris Lupo Computer Science Cal Poly Slide 1 Acknowledgements Dr. Paul Choboter Jason Mak Ian Panzer Spencer Lines Sagiv Sheelo Jake Gardner Slide 2 Background Joint research with Dr. Paul Choboter
More informationIntroduzione all'uso di un modello numerico di circolazione: ROMS-AGRIF Andrea Doglioli
Introduzione all'uso di un modello numerico di circolazione: ROMS-AGRIF Andrea Doglioli Mercoledì 14 Luglio 2010 Sala conferenze ISMAR-CNR, Venezia http://www.myroms.org/ http://lseet.univ-tln.fr/ecoleete/documents/roms_tools.pdf
More informationCCSM Performance with the New Coupler, cpl6
CCSM Performance with the New Coupler, cpl6 Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Research Jay Larson Rob Jacob Everest Ong Argonne National Laboratory Chris Ding Helen He
More informationA Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012
A Software Developing Environment for Earth System Modeling Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 1 Outline Motivation Purpose and Significance Research Contents Technology
More informationThe Parallelization of the Princeton Ocean Model
The Parallelization of the Princeton Ocean Model L.A. Boukas 1,N.Th.Mimikou 1, N.M. Missirlis 1, G.L. Mellor 2,A.Lascaratos 3, and G. Korres 3 1 Department of Informatics, University of Athens, Panepistimiopolis
More informationRegCM-ROMS Tutorial: Introduction to ROMS Ocean Model
RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model Ufuk Utku Turuncoglu ICTP (International Center for Theoretical Physics) Earth System Physics Section - Outline Outline Introduction Grid generation
More informationBenchmark results on Knight Landing (KNL) architecture
Benchmark results on Knight Landing (KNL) architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Roma 23/10/2017 KNL, BDW, SKL A1 BDW A2 KNL A3 SKL cores per node 2 x 18 @2.3
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationNOAA-GFDL s new ocean model: MOM6
NOAA-GFDL s new ocean model: MOM6 Presented by Alistair Adcroft with Robert Hallberg, Stephen Griffies, and the extended OMDT at GFDL CESM workshop, Ocean Model Working Group, Breckenridge, CO What is
More informationAdapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs
Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationUnified Model Performance on the NEC SX-6
Unified Model Performance on the NEC SX-6 Paul Selwood Crown copyright 2004 Page 1 Introduction The Met Office National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational
More informationEarly Evaluation of the Cray X1 at Oak Ridge National Laboratory
Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square
More informationMarshall Ward National Computational Infrastructure
SCALABILITY OF MOM 5, NEMO, AND MOM 6 ON NCI'S RAIJIN SUPERCOMPUTER Marshall Ward National Computational Infrastructure nci.org.au @NCInews http://marshallward.org/talks/ecmwf2016.html ATMOSPHERIC SCALES
More informationOutline. MPI in ROMS ROMS. Sample Grid. ROMS and its grids Domain decomposition Picky details Debugging story
1 2 Outline MPI in ROMS Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA November 2012 ROMS and its grids Domain decomposition Picky details Debugging story 3 4 ROMS Regional Ocean Modeling System
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationCPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL
CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL Anthony P. Craig 1 Robert Jacob 2 Brian Kauffman 1 Tom Bettge 3 Jay Larson 2 Everest Ong 2 Chris Ding
More informationHigh performance computing in an operational setting. Johan
High performance computing in an operational setting Johan Hartnack(jnh@dhigroup.com) DHI in short We re an independent, private and not-for-profit organisation Our people are highly qualified 80% of our
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationIntroduction to PRECIS
Introduction to PRECIS Joseph D. Intsiful CGE Hands-on training Workshop on V & A, Asuncion, Paraguay, 14 th 18 th August 2006 Crown copyright Page 1 Contents What, why, who The components of PRECIS Hardware
More informationDigital Earth Routine on Tegra K1
Digital Earth Routine on Tegra K1 Aerosol Optical Depth Retrieval Performance Comparison and Energy Efficiency Energy matters! Ecological A topic that affects us all Economical Reasons Practical Curiosity
More informationMPI in ROMS. Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA January 2011
1 MPI in ROMS Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA January 2011 2 Outline Introduction to parallel computing ROMS grids Domain decomposition Picky details Debugging story 3 Parallel Processing
More informationMOHID parallelization upgrade following a domain decomposition approach
MOHID parallelization upgrade following a domain decomposition approach Document Classification: Distribution Observations Public NVIST n/a Title MOHID parallelization following a domain decomposition
More informationEfficient Tridiagonal Solvers for ADI methods and Fluid Simulation
Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular
More informationMassively Parallel Phase Field Simulations using HPC Framework walberla
Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationComputer Architectures and Aspects of NWP models
Computer Architectures and Aspects of NWP models Deborah Salmond ECMWF Shinfield Park, Reading, UK Abstract: This paper describes the supercomputers used in operational NWP together with the computer aspects
More informationFirst Experiences with Application Development with Fortran Damian Rouson
First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationAnalyzing the Performance of IWAVE on a Cluster using HPCToolkit
Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,
More informationHPC Performance Advances for Existing US Navy NWP Systems
HPC Performance Advances for Existing US Navy NWP Systems Timothy Whitcomb, Kevin Viner Naval Research Laboratory Marine Meteorology Division Monterey, CA Matthew Turner DeVine Consulting, Monterey, CA
More informationIntroduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15
Introduction to ARSC David Newman (from Tom Logan slides), September 3 2015 What we do: High performance computing, university owned and operated center Provide HPC resources and support Conduct research
More informationShared Memory programming paradigm: openmp
IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM
More informationMemory Management. Goals of Memory Management. Mechanism. Policies
Memory Management Design, Spring 2011 Department of Computer Science Rutgers Sakai: 01:198:416 Sp11 (https://sakai.rutgers.edu) Memory Management Goals of Memory Management Convenient abstraction for programming
More informationFlow over rough topography. A preliminary study with high resolution topography at Ormen Lange.
Flow over rough topography. A preliminary study with high resolution topography at Ormen Lange. Helge Avlesen Jarle Berntsen November 21 Abstract Ormen Lange is an offshore gas field located in the Storegga
More informationNEMO Performance Benchmark and Profiling. May 2011
NEMO Performance Benchmark and Profiling May 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox
More informationRunning the NIM Next-Generation Weather Model on GPUs
Running the NIM Next-Generation Weather Model on GPUs M.Govett 1, J.Middlecoff 2 and T.Henderson 2 1 NOAA Earth System Research Laboratory, Boulder, CO, USA 2 Cooperative Institute for Research in the
More informationDynamic Load Balancing for Weather Models via AMPI
Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationSoftware Infrastructure for Data Assimilation: Object Oriented Prediction System
Software Infrastructure for Data Assimilation: Object Oriented Prediction System Yannick Trémolet ECMWF Blueprints for Next-Generation Data Assimilation Systems, Boulder, March 2016 Why OOPS? Y. Trémolet
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationHow to overcome common performance problems in legacy climate models
How to overcome common performance problems in legacy climate models Jörg Behrens 1 Contributing: Moritz Handke 1, Ha Ho 2, Thomas Jahns 1, Mathias Pütz 3 1 Deutsches Klimarechenzentrum (DKRZ) 2 Helmholtz-Zentrum
More informationOutline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work
Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3
More informationDetermining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace
Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more
More informationFindings from real petascale computer systems with meteorological applications
15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationParallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008
Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared
More informationA TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER
A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this
More informationThe Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling
The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers
More informationTurbostream: A CFD solver for manycore
Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware
More informationComparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of
More informationJanuary 2013 / LMDZ training session. Le code LMDZ. The LMDz Code
The LMDz Code The LMDz Code Outline Code structure : general principles,... Principles for compilation : dependencies, makefiles, CPP directives,... Code management : principles, SVN, management tools
More informationNew Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop
New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 14th Layered Ocean Model Workshop August 22, 2007 HYCOM 2.2 (I) Maintain all features of HYCOM 2.1 Orthogonal curvilinear grids Can emulate
More informationForestClaw : Mapped, multiblock adaptive quadtrees
ForestClaw : Mapped, multiblock adaptive quadtrees Donna Calhoun (Boise State University) Carsten Burstedde (Univ. of Bonn) HPC 3 November 9-13, 2014 KAUST - Saudi Arabia The ForestClaw Project Project
More informationPorting COSMO to Hybrid Architectures
Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,
More informationHybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy!
Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy! 2012 Programming weather, climate, and earth-system models
More informationCFD exercise. Regular domain decomposition
CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationBenchmark results on Knight Landing architecture
Benchmark results on Knight Landing architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Milano, 21/04/2017 KNL vs BDW A1 BDW A2 KNL cores per node 2 x 18 @2.3 GHz 1 x 68
More informationImplementation of an integrated efficient parallel multiblock Flow solver
Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes
More informationDeveloping the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang
Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions
More informationCESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011
CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationParallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?
Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple
More informationICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction
ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow
More informationMPAS-O: Plan for 2013
MPAS-O: Plan for 2013 Todd Ringler Theoretical Division Los Alamos National Laboratory Climate, Ocean and Sea-Ice Modeling Project http://public.lanl.gov/ringler/ringler.html Staffing, Effort and Focus
More informationToward An Integrated Cluster File System
Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file
More informationIn 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that:
Parallel Computing and Data Locality Gary Howell In 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that: Real estate and efficient computation
More informationOptimizing an Earth Science Atmospheric Application with the OmpSs Programming Model
www.bsc.es Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model HPC Knowledge Meeting'15 George S. Markomanolis, Jesus Labarta, Oriol Jorba University of Barcelona, Barcelona,
More informationNew Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 16th Layered Ocean Model Workshop
New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 16th Layered Ocean Model Workshop May 23, 2013 Mass Conservation - I Mass conservation is important for climate studies It is a powerfull
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationBMGTOOLS: A COMMUNITY TOOL TO HANDLE MODEL GRID AND BATHYMETRY
#49-April 2014-94 BMGTOOLS: A COMMUNITY TOOL TO HANDLE MODEL GRID AND BATHYMETRY By S. Theetten (1), B. Thiébault (2), F. Dumas (1), J. Paul (3) 1 2 ARTENUM, Toulouse, France 3 MERCATOR OCEAN, Toulouse,
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationParallel Computing Why & How?
Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel
More informationPerformance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI
Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI Gabriel Marin February 11, 2014 MPAS-Ocean [4] is a component of the MPAS framework of climate models. MPAS-Ocean is an unstructured-mesh
More informationOptimal Combining Data for Improving Ocean Modeling
DISTRIBUTION STATEMENT A: Approved for public release; distribution is unlimited. Optimal Combining Data for Improving Ocean Modeling L.I.Piterbarg University of Southern California Center of Applied Mathematical
More informationCo-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University
Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming
More informationHigh Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins
High Performance Computing: Architecture, Applications, and SE Issues Peter Strazdins Department of Computer Science, Australian National University e-mail: peter@cs.anu.edu.au May 17, 2004 COMP1800 Seminar2-1
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationPerformance Analysis for Large Scale Simulation Codes with Periscope
Performance Analysis for Large Scale Simulation Codes with Periscope M. Gerndt, Y. Oleynik, C. Pospiech, D. Gudu Technische Universität München IBM Deutschland GmbH May 2011 Outline Motivation Periscope
More informationWRF Software Architecture. John Michalakes, Head WRF Software Architecture Michael Duda Dave Gill
WRF Software Architecture John Michalakes, Head WRF Software Architecture Michael Duda Dave Gill Outline Introduction Computing Overview WRF Software Overview Introduction WRF Software Characteristics
More informationMPI Casestudy: Parallel Image Processing
MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by
More informationOpenFOAM on BG/Q porting and performance
OpenFOAM on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA SYSTEM OVERVIEW OpenFOAM : selected application inside of PRACE project Fermi : PRACE Tier- System Model: IBM-BlueGene /Q
More informationCAF versus MPI Applicability of Coarray Fortran to a Flow Solver
CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationICON Performance Benchmark and Profiling. March 2012
ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC
More informationCUDA GPGPU Workshop 2012
CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline
More informationMPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition
MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition Yun He and Chris H.Q. Ding NERSC Division, Lawrence Berkeley National
More informationScalable Dynamic Load Balancing of Detailed Cloud Physics with FD4
Center for Information Services and High Performance Computing (ZIH) Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Minisymposium on Advances in Numerics and Physical Modeling for Geophysical
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationCompilation for Heterogeneous Platforms
Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey
More informationNumerical Ocean Modeling and Simulation with CUDA
Numerical Ocean Modeling and Simulation with CUDA Jason Mak California Polytechnic State University Email: jamak@calpoly.edu Paul Choboter California Polytechnic State University Email: pchobote@calpoly.edu
More informationOcean Simulations using MPAS-Ocean
Ocean Simulations using MPAS-Ocean Mark Petersen and the MPAS-Ocean development team Los Alamos National Laboratory U N C L A S S I F I E D Slide 1 Progress on MPAS-Ocean in 2010 MPAS-Ocean is a functioning
More informationNEMO-MED: OPTIMIZATION AND IMPROVEMENT OF SCALABILITY
Research Papers Issue RP0096 January 2011 Scientific Computing and Operation (SCO) NEMO-MED: OPTIMIZATION AND IMPROVEMENT OF SCALABILITY By Italo Epicoco University of Salento, Italy italo.epicoco@unisalento.it
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationDRAFT. ATMET Technical Note. Common Customizations for RAMS v6.0. Number 2. Prepared by: ATMET, LLC PO Box Boulder, Colorado
DRAFT ATMET Technical Note Number 2 Common Customizations for RAMS v60 Prepared by: ATMET, LLC PO Box 19195 Boulder, Colorado 80308-2195 October 2006 Table of Contents 1 Adding Array Space to the Model
More informationAnaliza in optimizacija implementacije modela NAPOM za Morsko biološko Postajo.
Inštitut Jožef Stefan 1000 Ljubljana, Jamova 39 Odsek E6 - KOMUNIKACIJSKI SISTEMI IJS-DP-11167 Analiza in optimizacija implementacije modela NAPOM za Morsko biološko Postajo. M. Depolli, J. Ugovšek, R.
More informationBenefits of Programming Graphically in NI LabVIEW
Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers and scientists to
More informationBenefits of Programming Graphically in NI LabVIEW
1 of 8 12/24/2013 2:22 PM Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers
More information