3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

Size: px

Start display at page:

Download "3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis"

Esmond Morrison
6 years ago
Views:

1 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Technical & Scientific Computing I ( ) Seminar on Computer Science I ( ) Parallel FEM

2 ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications with Automatic Tuning (AT) Kengo Nakajima Information Technology Center Masaki Satoh (AORI/U. Tokyo), Takashi Furumura (ERI/U. Tokyo) Hiroshi Okuda (GS Frontier Sciences/U. Tokyo), Takeshi Iwashita (ACCMS/Kyoto U.) Hide Sakaguchi (JAMSTEC)

3 Post T2K System 20-30 PFLOPS, FY.2015 Many-core based (e.g. (only) Intel MIC/Xeon Phi) Joint Center for Advanced High Performance Computing (JCAHPC, http://jcahpc.

3 3 Post T2K System PFLOPS, FY.2015 Many-core based (e.g. (only) Intel MIC/Xeon Phi) Joint Center for Advanced High Performance Computing (JCAHPC, University of Tsukuba University of Tokyo Programming is still difficult, although Intel compiler works. (MPI + OpenMP) Tuning for performance (e.g. prefetching) is essential Some framework for helping users needed

4 4 Key-Issues for Appl s/algorithms towards Post-Peta & Exa Computing Jack Dongarra (ORNL/U. Tennessee) at ISC 2013 Heterogeneous/Hybrid Architecture Communication/Synchronization Reducing Algorithms Mixed Precision Computation Auto-Tuning/Self-Adapting Fault Resilient Algorithms Reproducibility of Results

5 5 ppopen-hpc (1/3) Open Source Infrastructure for development and execution of large-scale scientific applications on postpeta-scale supercomputers with automatic tuning (AT) pp : post-peta-scale Five-year project (FY ) (since April 2011) P.I.: Kengo Nakajima (ITC, The University of Tokyo) Part of Development of System Software Technologies for Post-Peta Scale High Performance Computing funded by JST/CREST (Japan Science and Technology Agency, Core Research for Evolutional Science and Technology) 4.5 M$ for 5 yr. Team with 6 institutes, >30 people (5 PDs) from various fields: Co-Desigin ITC/U.Tokyo, AORI/U.Tokyo, ERI/U.Tokyo, FS/U.Tokyo Kyoto U., JAMSTEC

6 6

7 7 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)

8 8

9 ppopen-hpc covers 9

10 10 ppopen-appl A set of libraries corresponding to each of the five methods noted above (FEM, FDM, FVM, BEM, DEM), providing: I/O netcdf-based Interface Domain-to-Domain Communications Optimized Linear Solvers (Preconditioned Iterative Solvers) Optimized for each discretization method Matrix Assembling AMR and Dynamic Load Balancing

11 11 FEM Code on ppopen-hpc Optimization/parallelization could be hide from application developers Program My_pFEM use ppopenfem_util use ppopenfem_solver call ppopenfem_init call ppopenfem_cntl call ppopenfem_mesh call ppopenfem_mat_init do call Users_FEM_mat_ass call Users_FEM_mat_bc call ppopenfem_solve call ppopenfem_vis Time= Time + DT enddo call ppopenfem_finalize stop end

12 12 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)

13 13 ppopen-hpc (3/3) Capability of automatic tuning (AT) enables development of optimized codes and libraries on emerging architecture based on results by existing architectures and machine parameters. Mem. Access, Host/Co-Proc. Balance, Comp./Comm. Overlapping Solvers & Libraries in ppopen-hpc OpenFOAM, PETSc etc. Target system is Post T2K system PFLOPS, FY.2015 Many-core based (e.g. Intel MIC/Xeon Phi) ppopen-hpc helps smooth transition of users to new system

14 14

15 15 ppopen-math A set of common numerical libraries Multigrid solvers (ppopen-math/mg) Parallel graph libraries (ppopen-math/graph) Parallel visualization (ppopen-math/vis) Library for coupled multi-physics simulations (loosecoupling) (ppopen-math/mp) Originally developed as a coupler for NICAM (atmosphere, unstructured), and COCO (ocean, structured) in global climate simulations using K computer Both codes are major codes on the K computer.» Prof. Masaki Satoh (AORI/U.Tokyo): NICAM» Prof. Hiroyasu Hasumi (AORI/U.Tokyo): COCO Developed coupler is extended to more general use. Coupled seismic simulations

NICAM: Semi-Unstructured Grid MIROC-A: FDM/Structured

Model-1 Atmospheric Model-2 MIROC-A J-cup

Transformation Multi-Ensemble IO Pre- and

System -System S/W -Architecture COCO Regional COCO

16 NICAM: Semi-Unstructured Grid MIROC-A: FDM/Structured Grid 16 NICAM- Agrid NICAM- ZMgrid Atmospheric Model-1 Atmospheric Model-2 MIROC-A J-cup ppopen-math/mp Coupler Ocean Model Grid Transformation Multi-Ensemble IO Pre- and post-process Fault tolerance M N Post-Peta-Scale System -System S/W -Architecture COCO Regional COCO Matsumuramodel COCO: Tri-Polar FDM Regional Ocean Model Non Hydrostatic Model c/o T.Arakawa, M.Satoh

(unconstructed grid) using newly developed ppopen MATH/MP Coupler c/o T.

(space): 0.1 km (regular) Resolution (time) : 5 ms (effective freq.

17 Large-Scale Coupled Simulations in FY Challenge (FY2013) : A test of a coupling simulation of FDM (regular grid) and FEM (unconstructed grid) using newly developed ppopen MATH/MP Coupler c/o T.Furumura FDM: Seismic Wave Propagation Model size: 80x80x400 km Time: 240 s Resolution (space): 0.1 km (regular) Resolution (time) : 5 ms (effective freq.<1hz) FEM: Building Response Model size: 400x400x200 m Time: 60 s Resolution (space): 1 m Resolution (time) : 1 ms ppopen MATH/MP: Space temporal interpolation, Mapping between FDM and FEM mesh, etc.

18 Schedule of Public Release (with English Documents) We are now focusing on MIC/Xeon Phi 18 4Q 2012 (Ver.0.1.0) ppopen-hpc for Multicore Cluster (Cray, K etc.) Preliminary version of ppopen-at/static 4Q 2013 (Ver.0.2.0) ppopen-hpc for Multicore Cluster & Xeon Phi (& GPU) available in SC 13 4Q 2014 Prototype of ppopen-hpc for Post-Peta Scale System 4Q 2015 Final version of ppopen-hpc for Post-Peta Scale System Further optimization on the target system

19 19 ppopen-hpc Ver Released at SC12 (or can be downloaded) Multicore cluster version (Flat MPI, OpenMP/MPI Hybrid) with documents in English Collaborations with scientists Component Archive Flat MPI OpenMP/MPI C F ppopen APPL/FDM ppohfdm_0.1.0 ppopen APPL/FVM ppohfvm_0.1.0 ppopen APPL/FEM ppohfem_0.1.0 ppopen APPL/BEM ppohbem_0.1.0 ppopen APPL/DEM ppohdem_0.1.0 ppopen MATH/VIS ppohvis_fdm3d_0.1.0 ppopen AT/STATIC ppohat_0.1.0

20 20 What is new in Ver.0.2.0? Available in SC13 (or can be downloaded) Component New Development ppopen-appl/fdm OpenMP/MPI Hybrid Parallel Programming Model Intel Xeon/Phi Version Interface for ppopen-math/vis-fdm3d ppopen-appl/fvm Optimized Communication ppopen-appl/fem Sample Implementations for Dynamic Solid Mechanics ppopen-math/mp-pp Tool for Generation of Remapping Table in ppopen- MATH/MP ppopen-math/vis Optimized ppopen-math/vis-fdm3d ppopen-at/static Sequence of Statements, Loop Splitting (Optimized) ppopen-appl/fvm ppopen-appl/fdm BEM

21 ppopen-math/vis Parallel Visualization using Information of Background Voxels [Nakajima & Chen 2006] FDM version is released: ppopen-math/vis-fdm3d UCD single file Platform T2K,Cray FX10 Flat MPI

21 21 ppopen-math/vis Parallel Visualization using Information of Background Voxels [Nakajima & Chen 2006] FDM version is released: ppopen-math/vis-fdm3d UCD single file Platform T2K,Cray FX10 Flat MPI Unstructured/Hybrid version Next release [Refine] AvailableMemory = 2.0 Available memory size (GB), not available in this version. MaxVoxelCount = 500 Maximum number of voxels MaxRefineLevel = 20 Maximum number of refinement levels

22 SDSC Simplified Parallel Visualization using Background Voxels Octree-based AMR AMR applied to the region where gradient of field values are large stress concentration, shock wave, separation etc. If the number of voxels are controled, a single file with 10 5 meshes is possible, even though entire problem size is 10 9 with distributed data sets.

23 SDSC FEM Mesh (SW Japan Model)

24 SDSC Voxel Mesh (initial)

25 SDSC Voxel Mesh (2-level adapted)

SDSC 2013 26 Example of Surface Simplification Initial (11,884

26 SDSC Example of Surface Simplification Initial (11,884 tri s) 50% reduction (5,942 ) 95% reduction (594) 98% reduction (238)

27 pfem-vis 27 pfem3d + ppopen-math/vis Files >$ cd <$O-fem2> >$ cp /home/z30088/fem2/c/fem3dv.tar. >$ tar xvf fem3dv.tar >$ cd <$O-fem2>/fem3dv/src >$ cd../run >$ pjsub go.sh

28 pfem-vis Makefile 28 PPOHVISDIR= /home/z30088/fem2/ppohvis_pfem3dstr_140115/ppohvis/ LIB_DIR = $(PPOHVISDIR)/lib INC_DIR = $(PPOHVISDIR)/include CFLAGSL = -I/home/z30088/ppohVIS_test/include LDFLAGSL = -L/home/z30088/ppohVIS_test/lib LIBSL = -lppohvisfdm3d OPTFLAGS = -Kfast -I$(INC_DIR) LFLAGS = -L$(LIB_DIR) $(LIBS) -lm #.SUFFIXES:.SUFFIXES:.o.c.c.o: $(CC) $(OPTFLAGS) -c $< -o $@ # TARGET =../run/sol default: $(TARGET) OBJS = test1_p.o pfem_init.o... $(TARGET): $(OBJS) $(CC) $(OPTFLAGS) -o $@ $(OBJS) $(LFLAGS) clean: /bin/rm -f *.o *~ $(TARGET)

29 pfem-vis 29 <$O-fem2>/fem3dv/run cube_32x32x32_8pe.0 cube_32x32x32_8pe.1 cube_32x32x32_8pe.2 cube_32x32x32_8pe.3 cube_32x32x32_8pe.4 cube_32x32x32_8pe.5 cube_32x32x32_8pe.6 cube_32x32x32_8pe.7 go.sh INPUT.DAT vis.cnt cube_32x32x32_8pe #!/bin/sh #PJM -L "node=1" #PJM -L "elapse=00:10:00" #PJM -L "rscgrp=lecture" #PJM -g "gt74" #PJM -j #PJM -o "cube8.lst" #PJM --mpi "proc=8" mpiexec./sol

30 pfem-vis 30 pfem3d + ppopen-math/vis INPUT.DAT <HEADER>.* sol with ppohvis vis.cnt Distributed Local Mesh Files vis_disp.1.inp vis_sigma-zz.1.inp ParaView Output

31 pfem-vis 31 #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" #include "ppohvis_pfem3d_util.h" C/main (1/2) extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON0(); extern void MAT_CON1(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; struct ppohvis_base_stcontrol *pcontrol=null; struct ppohvis_base_stresultcollection *pnoderesult=null; PFEM_INIT(argc,argv); ppohvis_pfem3d_init(mpi_comm_world); INPUT_CNTL(); pcontrol = ppohvis_pfem3d_getcontrol("vis.cnt"); INPUT_GRID(); if(ppohvis_pfem3d_setmeshex( NP,N,NODE_ID,XYZ, ICELTOT,ICELTOT_INT,ELEM_ID,ICELNOD, NEIBPETOT,NEIBPE,IMPORT_INDEX,IMPORT_ITEM,EXPORT_INDEX,EXPORT_ITEM)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); };

32 pfem-vis 32 C/main (2/2) SOLVE33(); RECOVER_STRESS(); pnoderesult=ppohvis_base_allocateresultcollection(); if(pnoderesult == NULL) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; pnoderesult->results[0]= ppohvis_pfem3d_convresultnodeitem(np,3,"disp",x); pnoderesult->results[1]= ppohvis_pfem3d_convresultnodeitempart(np,3,2, "sgm-zz",sigma_n); if(ppohvis_pfem3d_visualize(pnoderesult,null,pcontrol,"vis",1)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; ppohvis_pfem3d_finalize(); PFEM_FINALIZE() ;

33 pfem-vis vis.cnt 33 [Refine] Section for Refinement Control AvailableMemory = 2.0 (GB)not in use MaxVoxelCount = 1000 Max Voxel # MaxRefineLevel = 20 Max Voxel Refinement Level [Simple] Section for Simplification Control ReductionRate = 0.0 Reduction Rate of Surf. Patches 32,768 elements, 39,949 nodes 554 elements, 828 nodes Deformation Fig. not supported.

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen HPC Open Source Infrastructure