HPC parallelization of oceanographic models via high-level techniques

Size: px
Start display at page:

Download "HPC parallelization of oceanographic models via high-level techniques"

Transcription

1 HPC parallelization of oceanographic models via high-level techniques Piero Lanucara, Vittorio Ruggiero CASPUR Vincenzo Artale, Andrea Bargagli, Adriana Carillo, Gianmaria Sannino ENEA Casaccia Roma, Italy HPC parallelization of oceanographic models via high-level techniques p.1/27

2 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: HPC parallelization of oceanographic models via high-level techniques p.2/27

3 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models HPC parallelization of oceanographic models via high-level techniques p.2/27

4 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: HPC parallelization of oceanographic models via high-level techniques p.2/27

5 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization HPC parallelization of oceanographic models via high-level techniques p.2/27

6 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization The target is the development of HPC oceanographic models HPC parallelization of oceanographic models via high-level techniques p.2/27

7 Princeton Ocean Model One of the most popular three-dimensional oceanographic models HPC parallelization of oceanographic models via high-level techniques p.3/27

8 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: HPC parallelization of oceanographic models via high-level techniques p.3/27

9 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27

10 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27

11 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.4/27

12 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.5/27

13 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.6/27

14 POM parallelization roadmap Many parallel POM codes available from literature HPC parallelization of oceanographic models via high-level techniques p.7/27

15 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: HPC parallelization of oceanographic models via high-level techniques p.7/27

16 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) HPC parallelization of oceanographic models via high-level techniques p.7/27

17 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) HPC parallelization of oceanographic models via high-level techniques p.7/27

18 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done HPC parallelization of oceanographic models via high-level techniques p.7/27

19 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done This work was the starting point for the parallelization of a different release of POM entirely developed and mantained by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.7/27

20 The EPOM code A modified version of POM developed by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.8/27

21 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) HPC parallelization of oceanographic models via high-level techniques p.8/27

22 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines HPC parallelization of oceanographic models via high-level techniques p.8/27

23 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) HPC parallelization of oceanographic models via high-level techniques p.8/27

24 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) New mixed zeta sigma vertical coordinates (based on paper: Mellor, G. L., S. Hakkinen, T. Ezer and R. Patchen, A generalization of a sigma coordinate ocean model and an intercomparison of model vertical grids, Ocean Forecasting 2002) HPC parallelization of oceanographic models via high-level techniques p.8/27

25 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? HPC parallelization of oceanographic models via high-level techniques p.9/27

26 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: HPC parallelization of oceanographic models via high-level techniques p.9/27

27 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing HPC parallelization of oceanographic models via high-level techniques p.9/27

28 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user HPC parallelization of oceanographic models via high-level techniques p.9/27

29 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user The effort to maintain an MPI program grows proportionally with its complexity and with the number of code rearrangements HPC parallelization of oceanographic models via high-level techniques p.9/27

30 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode HPC parallelization of oceanographic models via high-level techniques p.10/27

31 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI HPC parallelization of oceanographic models via high-level techniques p.10/27

32 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit HPC parallelization of oceanographic models via high-level techniques p.10/27

33 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit SMS is entirely developed and maintained by the Advancing Computing Branch of Aviation Division of NOAA We thank SMS Staff for their support! HPC parallelization of oceanographic models via high-level techniques p.10/27

34 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.11/27

35 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.12/27

36 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.13/27

37 The advq routine of SMS EPOM subroutine advq(qb,q,dti2,qf) use memory implicit none integer :: i,j,k,it real :: dti2 CSMS$DISTRIBUTE(pom_dec,1,2) BEGIN real :: qb(im,jm,kb),q(im,jm,kb),qf(im,jm,kb), $ xflux(im,jm,kb),yflux(im,jm,kb) CSMS$DISTRIBUTE END CSMS$PARALLEL(pom_dec,<i>,<j>) BEGIN xflux(:,:,:)=0.e0 yflux(:,:,:)=0.e0 CSMS$CHECK_HALO(q<1:0,1:0>, halo of q in advq ) do k=2,kbm1 do j=2,jm do i=2,im HPC parallelization of oceanographic models via high-level techniques p.14/27

38 The advq routine of SMS EPOM xflux(i,j,k)=.25e0*(q(i,j,k)+q(i-1,j,k))* $ (u(i,j,k)+u(i,j,k-1)) yflux(i,j,k)=.25e0*(q(i,j,k)+q(i,j-1,k))* $ (v(i,j,k)+v(i,j,k-1)) enddo enddo enddo do k=2,kbm1 do j=2,jm do i=2,im xflux(i,j,k)=xflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i-1,j,k)+ $ aam(i,j,k-1)+aam(i-1,j,k-1)) $ *(qb(i,j,k)-qb(i-1,j,k))*dum(i,j) $ /(dx(i,j)+dx(i-1,j)) HPC parallelization of oceanographic models via high-level techniques p.15/27

39 The advq routine of SMS EPOM yflux(i,j,k)=yflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i,j-1,k)+ $ aam(i,j,k-1)+aam(i,j-1,k-1)) $ *(qb(i,j,k)-qb(i,j-1,k))*dvm(i,j) $ /(dy(i,j)+dy(i,j-1)) xflux(i,j,k)=.25e0*(dy(i,j)+dy(i-1,j)) $ *(dzz(i,j,k-1)+dzz(i-1,j,k-1))* $ xflux(i,j,k) yflux(i,j,k)=.25e0*(dx(i,j)+dx(i,j-1)) $ *(dzz(i,j,k-1)+dzz(i,j-1,k-1))* $ yflux(i,j,k) enddo enddo enddo CSMS$EXCHANGE(xflux<0:1,0:0>,yflux<0:0,0:1>) HPC parallelization of oceanographic models via high-level techniques p.16/27

40 The advq routine of SMS EPOM do k=2,kbm1 do j=2,jmm1 do i=2,imm1 qf(i,j,k)=.5e0*( w(i,j,k-1)*q(i,j,k-1) $ -w(i,j,k+1)*q(i,j,k+1))*art(i,j) $ +xflux(i+1,j,k)-xflux(i,j,k) $ +yflux(i,j+1,k)-yflux(i,j,k) qf(i,j,k)=((dzb(i,j,k)+dzb(i,j,k-1))*art(i,j)* $ qb(i,j,k) $ -2.*dti2*qf(i,j,k))/((dzf(i,j,k)+ $ dzf(i,j,k-1))*art(i,j)) enddo enddo enddo CSMS$COMPARE_VAR(qf, advq ) CSMS$PARALLEL END return end HPC parallelization of oceanographic models via high-level techniques p.17/27

41 The IBM SP4 at Caspur 1 server 690 HPC parallelization of oceanographic models via high-level techniques p.18/27

42 The IBM SP4 at Caspur 1 server Power4, 1.3 Ghz, 64 GB RAM HPC parallelization of oceanographic models via high-level techniques p.18/27

43 Mediterranean Sea Test Case grid points HPC parallelization of oceanographic models via high-level techniques p.19/27

44 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points HPC parallelization of oceanographic models via high-level techniques p.19/27

45 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points The serial code runs for about 15 seconds on a single Power4 for single time step ( day) of HPC parallelization of oceanographic models via high-level techniques p.19/27

46 Mediterranean Sea Test Case Salinity and Temperature plots up to 10 meters depth: HPC parallelization of oceanographic models via high-level techniques p.20/27

47 Mediterranean Sea Test Case Salinity and Temperature plots up to 250 meters depth: HPC parallelization of oceanographic models via high-level techniques p.21/27

48 Performance results: TERRA Very good performances (the table shows a speed up of almost 21 on 32 cpus) HPC parallelization of oceanographic models via high-level techniques p.22/27

49 Performance results: NOTERRA The performance degradation can be explained with the reduced amount of computation and increasing load imbalance but also with cache misses during the EX- CHANGE step HPC parallelization of oceanographic models via high-level techniques p.23/27

50 Future Developments Coupling with higher resolution SubModels HPC parallelization of oceanographic models via high-level techniques p.24/27

51 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions HPC parallelization of oceanographic models via high-level techniques p.25/27

52 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) HPC parallelization of oceanographic models via high-level techniques p.25/27

53 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) Coupling the Mediterranean Model (resolution about 10 Km) with Sub Area Models (resolution about 1 Km) is an efficient way to combine reasonable computation times with the necessity of higher resolution in some areas HPC parallelization of oceanographic models via high-level techniques p.25/27

54 Ligurian sea Coupling example Three indipendent interacting processes. HPC parallelization of oceanographic models via high-level techniques p.26/27

55 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region HPC parallelization of oceanographic models via high-level techniques p.26/27

56 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region HPC parallelization of oceanographic models via high-level techniques p.26/27

57 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region Process D: barotropic circulation in the Entire area at the coarse resolution and driver of all the communications HPC parallelization of oceanographic models via high-level techniques p.26/27

58 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) HPC parallelization of oceanographic models via high-level techniques p.27/27

59 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) HPC parallelization of oceanographic models via high-level techniques p.27/27

60 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. HPC parallelization of oceanographic models via high-level techniques p.27/27

61 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. We plan to use Model Coupling Toolkit from SciDAC project for an efficient parallel coupling HPC parallelization of oceanographic models via high-level techniques p.27/27

High Performance Ocean Modeling using CUDA

High Performance Ocean Modeling using CUDA using CUDA Chris Lupo Computer Science Cal Poly Slide 1 Acknowledgements Dr. Paul Choboter Jason Mak Ian Panzer Spencer Lines Sagiv Sheelo Jake Gardner Slide 2 Background Joint research with Dr. Paul Choboter

More information

Introduzione all'uso di un modello numerico di circolazione: ROMS-AGRIF Andrea Doglioli

Introduzione all'uso di un modello numerico di circolazione: ROMS-AGRIF Andrea Doglioli Introduzione all'uso di un modello numerico di circolazione: ROMS-AGRIF Andrea Doglioli Mercoledì 14 Luglio 2010 Sala conferenze ISMAR-CNR, Venezia http://www.myroms.org/ http://lseet.univ-tln.fr/ecoleete/documents/roms_tools.pdf

More information

CCSM Performance with the New Coupler, cpl6

CCSM Performance with the New Coupler, cpl6 CCSM Performance with the New Coupler, cpl6 Tony Craig Brian Kauffman Tom Bettge National Center for Atmospheric Research Jay Larson Rob Jacob Everest Ong Argonne National Laboratory Chris Ding Helen He

More information

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 A Software Developing Environment for Earth System Modeling Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012 1 Outline Motivation Purpose and Significance Research Contents Technology

More information

The Parallelization of the Princeton Ocean Model

The Parallelization of the Princeton Ocean Model The Parallelization of the Princeton Ocean Model L.A. Boukas 1,N.Th.Mimikou 1, N.M. Missirlis 1, G.L. Mellor 2,A.Lascaratos 3, and G. Korres 3 1 Department of Informatics, University of Athens, Panepistimiopolis

More information

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model Ufuk Utku Turuncoglu ICTP (International Center for Theoretical Physics) Earth System Physics Section - Outline Outline Introduction Grid generation

More information

Benchmark results on Knight Landing (KNL) architecture

Benchmark results on Knight Landing (KNL) architecture Benchmark results on Knight Landing (KNL) architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Roma 23/10/2017 KNL, BDW, SKL A1 BDW A2 KNL A3 SKL cores per node 2 x 18 @2.3

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

NOAA-GFDL s new ocean model: MOM6

NOAA-GFDL s new ocean model: MOM6 NOAA-GFDL s new ocean model: MOM6 Presented by Alistair Adcroft with Robert Hallberg, Stephen Griffies, and the extended OMDT at GFDL CESM workshop, Ocean Model Working Group, Breckenridge, CO What is

More information

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs

Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs Adapting Numerical Weather Prediction codes to heterogeneous architectures: porting the COSMO model to GPUs O. Fuhrer, T. Gysi, X. Lapillonne, C. Osuna, T. Dimanti, T. Schultess and the HP2C team Eidgenössisches

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

Unified Model Performance on the NEC SX-6

Unified Model Performance on the NEC SX-6 Unified Model Performance on the NEC SX-6 Paul Selwood Crown copyright 2004 Page 1 Introduction The Met Office National Weather Service Global and Local Area Climate Prediction (Hadley Centre) Operational

More information

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory

Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Early Evaluation of the Cray X1 at Oak Ridge National Laboratory Patrick H. Worley Thomas H. Dunigan, Jr. Oak Ridge National Laboratory 45th Cray User Group Conference May 13, 2003 Hyatt on Capital Square

More information

Marshall Ward National Computational Infrastructure

Marshall Ward National Computational Infrastructure SCALABILITY OF MOM 5, NEMO, AND MOM 6 ON NCI'S RAIJIN SUPERCOMPUTER Marshall Ward National Computational Infrastructure nci.org.au @NCInews http://marshallward.org/talks/ecmwf2016.html ATMOSPHERIC SCALES

More information

Outline. MPI in ROMS ROMS. Sample Grid. ROMS and its grids Domain decomposition Picky details Debugging story

Outline. MPI in ROMS ROMS. Sample Grid. ROMS and its grids Domain decomposition Picky details Debugging story 1 2 Outline MPI in ROMS Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA November 2012 ROMS and its grids Domain decomposition Picky details Debugging story 3 4 ROMS Regional Ocean Modeling System

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL

CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL CPL6: THE NEW EXTENSIBLE, HIGH PERFORMANCE PARALLEL COUPLER FOR THE COMMUNITY CLIMATE SYSTEM MODEL Anthony P. Craig 1 Robert Jacob 2 Brian Kauffman 1 Tom Bettge 3 Jay Larson 2 Everest Ong 2 Chris Ding

More information

High performance computing in an operational setting. Johan

High performance computing in an operational setting. Johan High performance computing in an operational setting Johan Hartnack(jnh@dhigroup.com) DHI in short We re an independent, private and not-for-profit organisation Our people are highly qualified 80% of our

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

Introduction to PRECIS

Introduction to PRECIS Introduction to PRECIS Joseph D. Intsiful CGE Hands-on training Workshop on V & A, Asuncion, Paraguay, 14 th 18 th August 2006 Crown copyright Page 1 Contents What, why, who The components of PRECIS Hardware

More information

Digital Earth Routine on Tegra K1

Digital Earth Routine on Tegra K1 Digital Earth Routine on Tegra K1 Aerosol Optical Depth Retrieval Performance Comparison and Energy Efficiency Energy matters! Ecological A topic that affects us all Economical Reasons Practical Curiosity

More information

MPI in ROMS. Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA January 2011

MPI in ROMS. Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA January 2011 1 MPI in ROMS Kate Hedstrom Dan Schaffer, NOAA Tom Henderson, NOAA January 2011 2 Outline Introduction to parallel computing ROMS grids Domain decomposition Picky details Debugging story 3 Parallel Processing

More information

MOHID parallelization upgrade following a domain decomposition approach

MOHID parallelization upgrade following a domain decomposition approach MOHID parallelization upgrade following a domain decomposition approach Document Classification: Distribution Observations Public NVIST n/a Title MOHID parallelization following a domain decomposition

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

Massively Parallel Phase Field Simulations using HPC Framework walberla

Massively Parallel Phase Field Simulations using HPC Framework walberla Massively Parallel Phase Field Simulations using HPC Framework walberla SIAM CSE 2015, March 15 th 2015 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer, Harald Köstler and Ulrich

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Computer Architectures and Aspects of NWP models

Computer Architectures and Aspects of NWP models Computer Architectures and Aspects of NWP models Deborah Salmond ECMWF Shinfield Park, Reading, UK Abstract: This paper describes the supercomputers used in operational NWP together with the computer aspects

More information

First Experiences with Application Development with Fortran Damian Rouson

First Experiences with Application Development with Fortran Damian Rouson First Experiences with Application Development with Fortran 2018 Damian Rouson Overview Fortran 2018 in a Nutshell ICAR & Coarray ICAR WRF-Hydro Results Conclusions www.yourwebsite.com Overview Fortran

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

HPC Performance Advances for Existing US Navy NWP Systems

HPC Performance Advances for Existing US Navy NWP Systems HPC Performance Advances for Existing US Navy NWP Systems Timothy Whitcomb, Kevin Viner Naval Research Laboratory Marine Meteorology Division Monterey, CA Matthew Turner DeVine Consulting, Monterey, CA

More information

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15 Introduction to ARSC David Newman (from Tom Logan slides), September 3 2015 What we do: High performance computing, university owned and operated center Provide HPC resources and support Conduct research

More information

Shared Memory programming paradigm: openmp

Shared Memory programming paradigm: openmp IPM School of Physics Workshop on High Performance Computing - HPC08 Shared Memory programming paradigm: openmp Luca Heltai Stefano Cozzini SISSA - Democritos/INFM

More information

Memory Management. Goals of Memory Management. Mechanism. Policies

Memory Management. Goals of Memory Management. Mechanism. Policies Memory Management Design, Spring 2011 Department of Computer Science Rutgers Sakai: 01:198:416 Sp11 (https://sakai.rutgers.edu) Memory Management Goals of Memory Management Convenient abstraction for programming

More information

Flow over rough topography. A preliminary study with high resolution topography at Ormen Lange.

Flow over rough topography. A preliminary study with high resolution topography at Ormen Lange. Flow over rough topography. A preliminary study with high resolution topography at Ormen Lange. Helge Avlesen Jarle Berntsen November 21 Abstract Ormen Lange is an offshore gas field located in the Storegga

More information

NEMO Performance Benchmark and Profiling. May 2011

NEMO Performance Benchmark and Profiling. May 2011 NEMO Performance Benchmark and Profiling May 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

Running the NIM Next-Generation Weather Model on GPUs

Running the NIM Next-Generation Weather Model on GPUs Running the NIM Next-Generation Weather Model on GPUs M.Govett 1, J.Middlecoff 2 and T.Henderson 2 1 NOAA Earth System Research Laboratory, Boulder, CO, USA 2 Cooperative Institute for Research in the

More information

Dynamic Load Balancing for Weather Models via AMPI

Dynamic Load Balancing for Weather Models via AMPI Dynamic Load Balancing for Eduardo R. Rodrigues IBM Research Brazil edrodri@br.ibm.com Celso L. Mendes University of Illinois USA cmendes@ncsa.illinois.edu Laxmikant Kale University of Illinois USA kale@cs.illinois.edu

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

Software Infrastructure for Data Assimilation: Object Oriented Prediction System

Software Infrastructure for Data Assimilation: Object Oriented Prediction System Software Infrastructure for Data Assimilation: Object Oriented Prediction System Yannick Trémolet ECMWF Blueprints for Next-Generation Data Assimilation Systems, Boulder, March 2016 Why OOPS? Y. Trémolet

More information

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT

TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware

More information

How to overcome common performance problems in legacy climate models

How to overcome common performance problems in legacy climate models How to overcome common performance problems in legacy climate models Jörg Behrens 1 Contributing: Moritz Handke 1, Ha Ho 2, Thomas Jahns 1, Mathias Pütz 3 1 Deutsches Klimarechenzentrum (DKRZ) 2 Helmholtz-Zentrum

More information

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work

Outline 1 Motivation 2 Theory of a non-blocking benchmark 3 The benchmark and results 4 Future work Using Non-blocking Operations in HPC to Reduce Execution Times David Buettner, Julian Kunkel, Thomas Ludwig Euro PVM/MPI September 8th, 2009 Outline 1 Motivation 2 Theory of a non-blocking benchmark 3

More information

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace James Southern, Jim Tuccillo SGI 25 October 2016 0 Motivation Trend in HPC continues to be towards more

More information

Findings from real petascale computer systems with meteorological applications

Findings from real petascale computer systems with meteorological applications 15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008 Parallel Computing Using OpenMP/MPI Presented by - Jyotsna 29/01/2008 Serial Computing Serially solving a problem Parallel Computing Parallelly solving a problem Parallel Computer Memory Architecture Shared

More information

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER

A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER A TIMING AND SCALABILITY ANALYSIS OF THE PARALLEL PERFORMANCE OF CMAQ v4.5 ON A BEOWULF LINUX CLUSTER Shaheen R. Tonse* Lawrence Berkeley National Lab., Berkeley, CA, USA 1. INTRODUCTION The goal of this

More information

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling

The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling The Evaluation of Parallel Compilers and Trapezoidal Self- Scheduling Will Smith and Elizabeth Fehrmann May 23, 2006 Multiple Processor Systems Dr. Muhammad Shaaban Overview Serial Compilers Parallel Compilers

More information

Turbostream: A CFD solver for manycore

Turbostream: A CFD solver for manycore Turbostream: A CFD solver for manycore processors Tobias Brandvik Whittle Laboratory University of Cambridge Aim To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware

More information

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster G. Jost*, H. Jin*, D. an Mey**,F. Hatay*** *NASA Ames Research Center **Center for Computing and Communication, University of

More information

January 2013 / LMDZ training session. Le code LMDZ. The LMDz Code

January 2013 / LMDZ training session. Le code LMDZ. The LMDz Code The LMDz Code The LMDz Code Outline Code structure : general principles,... Principles for compilation : dependencies, makefiles, CPP directives,... Code management : principles, SVN, management tools

More information

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 14th Layered Ocean Model Workshop New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 14th Layered Ocean Model Workshop August 22, 2007 HYCOM 2.2 (I) Maintain all features of HYCOM 2.1 Orthogonal curvilinear grids Can emulate

More information

ForestClaw : Mapped, multiblock adaptive quadtrees

ForestClaw : Mapped, multiblock adaptive quadtrees ForestClaw : Mapped, multiblock adaptive quadtrees Donna Calhoun (Boise State University) Carsten Burstedde (Univ. of Bonn) HPC 3 November 9-13, 2014 KAUST - Saudi Arabia The ForestClaw Project Project

More information

Porting COSMO to Hybrid Architectures

Porting COSMO to Hybrid Architectures Porting COSMO to Hybrid Architectures T. Gysi 1, O. Fuhrer 2, C. Osuna 3, X. Lapillonne 3, T. Diamanti 3, B. Cumming 4, T. Schroeder 5, P. Messmer 5, T. Schulthess 4,6,7 [1] Supercomputing Systems AG,

More information

Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy!

Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy! Hybrid Strategies for the NEMO Ocean Model on! Many-core Processors! I. Epicoco, S. Mocavero, G. Aloisio! University of Salento & CMCC, Italy! 2012 Programming weather, climate, and earth-system models

More information

CFD exercise. Regular domain decomposition

CFD exercise. Regular domain decomposition CFD exercise Regular domain decomposition Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Benchmark results on Knight Landing architecture

Benchmark results on Knight Landing architecture Benchmark results on Knight Landing architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Milano, 21/04/2017 KNL vs BDW A1 BDW A2 KNL cores per node 2 x 18 @2.3 GHz 1 x 68

More information

Implementation of an integrated efficient parallel multiblock Flow solver

Implementation of an integrated efficient parallel multiblock Flow solver Implementation of an integrated efficient parallel multiblock Flow solver Thomas Bönisch, Panagiotis Adamidis and Roland Rühle adamidis@hlrs.de Outline Introduction to URANUS Why using Multiblock meshes

More information

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang

Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Developing the TELEMAC system for HECToR (phase 2b & beyond) Zhi Shang Outline of the Talk Introduction to the TELEMAC System and to TELEMAC-2D Code Developments Data Reordering Strategy Results Conclusions

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother? Parallel Programming Concepts Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04 Parallel Background Why Bother? 1 What is Parallel Programming? Simultaneous use of multiple

More information

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow

More information

MPAS-O: Plan for 2013

MPAS-O: Plan for 2013 MPAS-O: Plan for 2013 Todd Ringler Theoretical Division Los Alamos National Laboratory Climate, Ocean and Sea-Ice Modeling Project http://public.lanl.gov/ringler/ringler.html Staffing, Effort and Focus

More information

Toward An Integrated Cluster File System

Toward An Integrated Cluster File System Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file

More information

In 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that:

In 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that: Parallel Computing and Data Locality Gary Howell In 1986, I had degrees in math and engineering and found I wanted to compute things. What I ve mostly found is that: Real estate and efficient computation

More information

Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model

Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model www.bsc.es Optimizing an Earth Science Atmospheric Application with the OmpSs Programming Model HPC Knowledge Meeting'15 George S. Markomanolis, Jesus Labarta, Oriol Jorba University of Barcelona, Barcelona,

More information

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 16th Layered Ocean Model Workshop

New Features of HYCOM. Alan J. Wallcraft Naval Research Laboratory. 16th Layered Ocean Model Workshop New Features of HYCOM Alan J. Wallcraft Naval Research Laboratory 16th Layered Ocean Model Workshop May 23, 2013 Mass Conservation - I Mass conservation is important for climate studies It is a powerfull

More information

CS 475: Parallel Programming Introduction

CS 475: Parallel Programming Introduction CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.

More information

BMGTOOLS: A COMMUNITY TOOL TO HANDLE MODEL GRID AND BATHYMETRY

BMGTOOLS: A COMMUNITY TOOL TO HANDLE MODEL GRID AND BATHYMETRY #49-April 2014-94 BMGTOOLS: A COMMUNITY TOOL TO HANDLE MODEL GRID AND BATHYMETRY By S. Theetten (1), B. Thiébault (2), F. Dumas (1), J. Paul (3) 1 2 ARTENUM, Toulouse, France 3 MERCATOR OCEAN, Toulouse,

More information

Running the FIM and NIM Weather Models on GPUs

Running the FIM and NIM Weather Models on GPUs Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution

More information

Parallel Computing Why & How?

Parallel Computing Why & How? Parallel Computing Why & How? Xing Cai Simula Research Laboratory Dept. of Informatics, University of Oslo Winter School on Parallel Computing Geilo January 20 25, 2008 Outline 1 Motivation 2 Parallel

More information

Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI

Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI Gabriel Marin February 11, 2014 MPAS-Ocean [4] is a component of the MPAS framework of climate models. MPAS-Ocean is an unstructured-mesh

More information

Optimal Combining Data for Improving Ocean Modeling

Optimal Combining Data for Improving Ocean Modeling DISTRIBUTION STATEMENT A: Approved for public release; distribution is unlimited. Optimal Combining Data for Improving Ocean Modeling L.I.Piterbarg University of Southern California Center of Applied Mathematical

More information

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University

Co-array Fortran Performance and Potential: an NPB Experimental Study. Department of Computer Science Rice University Co-array Fortran Performance and Potential: an NPB Experimental Study Cristian Coarfa Jason Lee Eckhardt Yuri Dotsenko John Mellor-Crummey Department of Computer Science Rice University Parallel Programming

More information

High Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins

High Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins High Performance Computing: Architecture, Applications, and SE Issues Peter Strazdins Department of Computer Science, Australian National University e-mail: peter@cs.anu.edu.au May 17, 2004 COMP1800 Seminar2-1

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Performance Analysis for Large Scale Simulation Codes with Periscope

Performance Analysis for Large Scale Simulation Codes with Periscope Performance Analysis for Large Scale Simulation Codes with Periscope M. Gerndt, Y. Oleynik, C. Pospiech, D. Gudu Technische Universität München IBM Deutschland GmbH May 2011 Outline Motivation Periscope

More information

WRF Software Architecture. John Michalakes, Head WRF Software Architecture Michael Duda Dave Gill

WRF Software Architecture. John Michalakes, Head WRF Software Architecture Michael Duda Dave Gill WRF Software Architecture John Michalakes, Head WRF Software Architecture Michael Duda Dave Gill Outline Introduction Computing Overview WRF Software Overview Introduction WRF Software Characteristics

More information

MPI Casestudy: Parallel Image Processing

MPI Casestudy: Parallel Image Processing MPI Casestudy: Parallel Image Processing David Henty 1 Introduction The aim of this exercise is to write a complete MPI parallel program that does a very basic form of image processing. We will start by

More information

OpenFOAM on BG/Q porting and performance

OpenFOAM on BG/Q porting and performance OpenFOAM on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA SYSTEM OVERVIEW OpenFOAM : selected application inside of PRACE project Fermi : PRACE Tier- System Model: IBM-BlueGene /Q

More information

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver

CAF versus MPI Applicability of Coarray Fortran to a Flow Solver CAF versus MPI Applicability of Coarray Fortran to a Flow Solver Manuel Hasert, Harald Klimach, Sabine Roller m.hasert@grs-sim.de Applied Supercomputing in Engineering Motivation We develop several CFD

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

ICON Performance Benchmark and Profiling. March 2012

ICON Performance Benchmark and Profiling. March 2012 ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC

More information

CUDA GPGPU Workshop 2012

CUDA GPGPU Workshop 2012 CUDA GPGPU Workshop 2012 Parallel Programming: C thread, Open MP, and Open MPI Presenter: Nasrin Sultana Wichita State University 07/10/2012 Parallel Programming: Open MP, MPI, Open MPI & CUDA Outline

More information

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition

MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition MPI and OpenMP Paradigms on Cluster of SMP Architectures: the Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition Yun He and Chris H.Q. Ding NERSC Division, Lawrence Berkeley National

More information

Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4

Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Center for Information Services and High Performance Computing (ZIH) Scalable Dynamic Load Balancing of Detailed Cloud Physics with FD4 Minisymposium on Advances in Numerics and Physical Modeling for Geophysical

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Numerical Ocean Modeling and Simulation with CUDA

Numerical Ocean Modeling and Simulation with CUDA Numerical Ocean Modeling and Simulation with CUDA Jason Mak California Polytechnic State University Email: jamak@calpoly.edu Paul Choboter California Polytechnic State University Email: pchobote@calpoly.edu

More information

Ocean Simulations using MPAS-Ocean

Ocean Simulations using MPAS-Ocean Ocean Simulations using MPAS-Ocean Mark Petersen and the MPAS-Ocean development team Los Alamos National Laboratory U N C L A S S I F I E D Slide 1 Progress on MPAS-Ocean in 2010 MPAS-Ocean is a functioning

More information

NEMO-MED: OPTIMIZATION AND IMPROVEMENT OF SCALABILITY

NEMO-MED: OPTIMIZATION AND IMPROVEMENT OF SCALABILITY Research Papers Issue RP0096 January 2011 Scientific Computing and Operation (SCO) NEMO-MED: OPTIMIZATION AND IMPROVEMENT OF SCALABILITY By Italo Epicoco University of Salento, Italy italo.epicoco@unisalento.it

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

DRAFT. ATMET Technical Note. Common Customizations for RAMS v6.0. Number 2. Prepared by: ATMET, LLC PO Box Boulder, Colorado

DRAFT. ATMET Technical Note. Common Customizations for RAMS v6.0. Number 2. Prepared by: ATMET, LLC PO Box Boulder, Colorado DRAFT ATMET Technical Note Number 2 Common Customizations for RAMS v60 Prepared by: ATMET, LLC PO Box 19195 Boulder, Colorado 80308-2195 October 2006 Table of Contents 1 Adding Array Space to the Model

More information

Analiza in optimizacija implementacije modela NAPOM za Morsko biološko Postajo.

Analiza in optimizacija implementacije modela NAPOM za Morsko biološko Postajo. Inštitut Jožef Stefan 1000 Ljubljana, Jamova 39 Odsek E6 - KOMUNIKACIJSKI SISTEMI IJS-DP-11167 Analiza in optimizacija implementacije modela NAPOM za Morsko biološko Postajo. M. Depolli, J. Ugovšek, R.

More information

Benefits of Programming Graphically in NI LabVIEW

Benefits of Programming Graphically in NI LabVIEW Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers and scientists to

More information

Benefits of Programming Graphically in NI LabVIEW

Benefits of Programming Graphically in NI LabVIEW 1 of 8 12/24/2013 2:22 PM Benefits of Programming Graphically in NI LabVIEW Publish Date: Jun 14, 2013 0 Ratings 0.00 out of 5 Overview For more than 20 years, NI LabVIEW has been used by millions of engineers

More information