HPC parallelization of oceanographic models via high-level techniques

Size: px

Start display at page:

Download "HPC parallelization of oceanographic models via high-level techniques"

Ambrose Martin
6 years ago
Views:

1 HPC parallelization of oceanographic models via high-level techniques Piero Lanucara, Vittorio Ruggiero CASPUR Vincenzo Artale, Andrea Bargagli, Adriana Carillo, Gianmaria Sannino ENEA Casaccia Roma, Italy HPC parallelization of oceanographic models via high-level techniques p.1/27

2 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: HPC parallelization of oceanographic models via high-level techniques p.2/27

3 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models HPC parallelization of oceanographic models via high-level techniques p.2/27

4 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: HPC parallelization of oceanographic models via high-level techniques p.2/27

5 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization HPC parallelization of oceanographic models via high-level techniques p.2/27

6 CASPUR ENEA cooperation ENEA CR CASACCIA CLIMATE PROJECT AIMS: to study the role of oceans in climate changes using numerical models CASPUR APPLIED MATH. GROUP MISSION: serial code optimization and parallelization The target is the development of HPC oceanographic models HPC parallelization of oceanographic models via high-level techniques p.2/27

7 Princeton Ocean Model One of the most popular three-dimensional oceanographic models HPC parallelization of oceanographic models via high-level techniques p.3/27

8 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: HPC parallelization of oceanographic models via high-level techniques p.3/27

9 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27

10 Princeton Ocean Model One of the most popular three-dimensional oceanographic models The serial code is widely used and can be downloaded from web: Some CASPUR ENEA contribution to the POM community: HPC parallelization of oceanographic models via high-level techniques p.3/27

11 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.4/27

12 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.5/27

13 Princeton Ocean Model HPC parallelization of oceanographic models via high-level techniques p.6/27

14 POM parallelization roadmap Many parallel POM codes available from literature HPC parallelization of oceanographic models via high-level techniques p.7/27

15 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: HPC parallelization of oceanographic models via high-level techniques p.7/27

16 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) HPC parallelization of oceanographic models via high-level techniques p.7/27

17 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) HPC parallelization of oceanographic models via high-level techniques p.7/27

18 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done HPC parallelization of oceanographic models via high-level techniques p.7/27

19 POM parallelization roadmap Many parallel POM codes available from literature CASPUR ENEA CR CASACCIA parallelization results: MPI parallel code (2000) Hybrid MPI+OpenMP parallel code (2001) A lot of good work was done This work was the starting point for the parallelization of a different release of POM entirely developed and mantained by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.7/27

20 The EPOM code A modified version of POM developed by ENEA CR CASACCIA HPC parallelization of oceanographic models via high-level techniques p.8/27

21 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) HPC parallelization of oceanographic models via high-level techniques p.8/27

22 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines HPC parallelization of oceanographic models via high-level techniques p.8/27

23 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) HPC parallelization of oceanographic models via high-level techniques p.8/27

24 The EPOM code A modified version of POM developed by ENEA CR CASACCIA Fully Fortran 90 (dynamic allocation, array syntax) New diagnostic routines New advection scheme (source code available for POM USERS) New mixed zeta sigma vertical coordinates (based on paper: Mellor, G. L., S. Hakkinen, T. Ezer and R. Patchen, A generalization of a sigma coordinate ocean model and an intercomparison of model vertical grids, Ocean Forecasting 2002) HPC parallelization of oceanographic models via high-level techniques p.8/27

25 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? HPC parallelization of oceanographic models via high-level techniques p.9/27

26 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: HPC parallelization of oceanographic models via high-level techniques p.9/27

27 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing HPC parallelization of oceanographic models via high-level techniques p.9/27

28 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user HPC parallelization of oceanographic models via high-level techniques p.9/27

29 EPOM parallelization roadmap Question: Could people in ENEA CR CASACCIA have the full control of the parallel EPOM code in order to maintain and update it? Answer : no. If MPI is used! why?: The standard researcher has a good experience in Fortran programming but nearly no expertise in parallel computing An MPI code is unreadble for a non expert user The effort to maintain an MPI program grows proportionally with its complexity and with the number of code rearrangements HPC parallelization of oceanographic models via high-level techniques p.9/27

30 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode HPC parallelization of oceanographic models via high-level techniques p.10/27

31 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI HPC parallelization of oceanographic models via high-level techniques p.10/27

32 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit HPC parallelization of oceanographic models via high-level techniques p.10/27

33 EPOM parallelization Dream A tool based on the high level technique: all of the low level serial and parallel machinery (data partitioning, communications, I/O,...) is defined via compiler directives and is performed by the compiler in a user hiding mode...with the same performance and portability of MPI We oriented our choice towards Scalable Modeling System (SMS) toolkit SMS is entirely developed and maintained by the Advancing Computing Branch of Aviation Division of NOAA We thank SMS Staff for their support! HPC parallelization of oceanographic models via high-level techniques p.10/27

34 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.11/27

35 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.12/27

36 Scalable Modeling System HPC parallelization of oceanographic models via high-level techniques p.13/27

37 The advq routine of SMS EPOM subroutine advq(qb,q,dti2,qf) use memory implicit none integer :: i,j,k,it real :: dti2 CSMS$DISTRIBUTE(pom_dec,1,2) BEGIN real :: qb(im,jm,kb),q(im,jm,kb),qf(im,jm,kb), $ xflux(im,jm,kb),yflux(im,jm,kb) CSMS$DISTRIBUTE END CSMS$PARALLEL(pom_dec,<i>,<j>) BEGIN xflux(:,:,:)=0.e0 yflux(:,:,:)=0.e0 CSMS$CHECK_HALO(q<1:0,1:0>, halo of q in advq ) do k=2,kbm1 do j=2,jm do i=2,im HPC parallelization of oceanographic models via high-level techniques p.14/27

38 The advq routine of SMS EPOM xflux(i,j,k)=.25e0*(q(i,j,k)+q(i-1,j,k))* $ (u(i,j,k)+u(i,j,k-1)) yflux(i,j,k)=.25e0*(q(i,j,k)+q(i,j-1,k))* $ (v(i,j,k)+v(i,j,k-1)) enddo enddo enddo do k=2,kbm1 do j=2,jm do i=2,im xflux(i,j,k)=xflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i-1,j,k)+ $ aam(i,j,k-1)+aam(i-1,j,k-1)) $ *(qb(i,j,k)-qb(i-1,j,k))*dum(i,j) $ /(dx(i,j)+dx(i-1,j)) HPC parallelization of oceanographic models via high-level techniques p.15/27

39 The advq routine of SMS EPOM yflux(i,j,k)=yflux(i,j,k)-.5e0*(aam(i,j,k)+ $ aam(i,j-1,k)+ $ aam(i,j,k-1)+aam(i,j-1,k-1)) $ *(qb(i,j,k)-qb(i,j-1,k))*dvm(i,j) $ /(dy(i,j)+dy(i,j-1)) xflux(i,j,k)=.25e0*(dy(i,j)+dy(i-1,j)) $ *(dzz(i,j,k-1)+dzz(i-1,j,k-1))* $ xflux(i,j,k) yflux(i,j,k)=.25e0*(dx(i,j)+dx(i,j-1)) $ *(dzz(i,j,k-1)+dzz(i,j-1,k-1))* $ yflux(i,j,k) enddo enddo enddo CSMS$EXCHANGE(xflux<0:1,0:0>,yflux<0:0,0:1>) HPC parallelization of oceanographic models via high-level techniques p.16/27

40 The advq routine of SMS EPOM do k=2,kbm1 do j=2,jmm1 do i=2,imm1 qf(i,j,k)=.5e0*( w(i,j,k-1)*q(i,j,k-1) $ -w(i,j,k+1)*q(i,j,k+1))*art(i,j) $ +xflux(i+1,j,k)-xflux(i,j,k) $ +yflux(i,j+1,k)-yflux(i,j,k) qf(i,j,k)=((dzb(i,j,k)+dzb(i,j,k-1))*art(i,j)* $ qb(i,j,k) $ -2.*dti2*qf(i,j,k))/((dzf(i,j,k)+ $ dzf(i,j,k-1))*art(i,j)) enddo enddo enddo CSMS$COMPARE_VAR(qf, advq ) CSMS$PARALLEL END return end HPC parallelization of oceanographic models via high-level techniques p.17/27

41 The IBM SP4 at Caspur 1 server 690 HPC parallelization of oceanographic models via high-level techniques p.18/27

42 The IBM SP4 at Caspur 1 server Power4, 1.3 Ghz, 64 GB RAM HPC parallelization of oceanographic models via high-level techniques p.18/27

43 Mediterranean Sea Test Case grid points HPC parallelization of oceanographic models via high-level techniques p.19/27

44 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points HPC parallelization of oceanographic models via high-level techniques p.19/27

45 Mediterranean Sea Test Case grid points unused in TERRA configuration. Use NOTERRA to skip computation on land points The serial code runs for about 15 seconds on a single Power4 for single time step ( day) of HPC parallelization of oceanographic models via high-level techniques p.19/27

46 Mediterranean Sea Test Case Salinity and Temperature plots up to 10 meters depth: HPC parallelization of oceanographic models via high-level techniques p.20/27

47 Mediterranean Sea Test Case Salinity and Temperature plots up to 250 meters depth: HPC parallelization of oceanographic models via high-level techniques p.21/27

48 Performance results: TERRA Very good performances (the table shows a speed up of almost 21 on 32 cpus) HPC parallelization of oceanographic models via high-level techniques p.22/27

49 Performance results: NOTERRA The performance degradation can be explained with the reduced amount of computation and increasing load imbalance but also with cache misses during the EX- CHANGE step HPC parallelization of oceanographic models via high-level techniques p.23/27

50 Future Developments Coupling with higher resolution SubModels HPC parallelization of oceanographic models via high-level techniques p.24/27

51 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions HPC parallelization of oceanographic models via high-level techniques p.25/27

52 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) HPC parallelization of oceanographic models via high-level techniques p.25/27

53 Mediterranean sea Circulation The circulation in the Mediterranean Sea is characterized by different oceanographic features localized in different areas, requiring different resolutions Examples: Gibraltar strait, Sicily channel (see again?) Coupling the Mediterranean Model (resolution about 10 Km) with Sub Area Models (resolution about 1 Km) is an efficient way to combine reasonable computation times with the necessity of higher resolution in some areas HPC parallelization of oceanographic models via high-level techniques p.25/27

54 Ligurian sea Coupling example Three indipendent interacting processes. HPC parallelization of oceanographic models via high-level techniques p.26/27

55 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region HPC parallelization of oceanographic models via high-level techniques p.26/27

56 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region HPC parallelization of oceanographic models via high-level techniques p.26/27

57 Ligurian sea Coupling example Three indipendent interacting processes. Process C: baroclinic circulation in the Coarse region Process F: baroclinic circulation in the Fine region Process D: barotropic circulation in the Entire area at the coarse resolution and driver of all the communications HPC parallelization of oceanographic models via high-level techniques p.26/27

58 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) HPC parallelization of oceanographic models via high-level techniques p.27/27

59 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) HPC parallelization of oceanographic models via high-level techniques p.27/27

60 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. HPC parallelization of oceanographic models via high-level techniques p.27/27

61 Ligurian sea Coupling Advantages The same Model is applied in all areas (only one code is needed) All three processes are parallel ones (SMS parallelization) The barotropic application applied to the entire area limits the communications at each internal time step (internal-to external time step ratio is usually 60) and avoids possible noises at the boundary between coarse and fine regions. We plan to use Model Coupling Toolkit from SciDAC project for an efficient parallel coupling HPC parallelization of oceanographic models via high-level techniques p.27/27

High Performance Ocean Modeling using CUDA

High Performance Ocean Modeling using CUDA using CUDA Chris Lupo Computer Science Cal Poly Slide 1 Acknowledgements Dr. Paul Choboter Jason Mak Ian Panzer Spencer Lines Sagiv Sheelo Jake Gardner Slide 2 Background Joint research with Dr. Paul Choboter