3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis
|
|
- Esmond Morrison
- 6 years ago
- Views:
Transcription
1 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Technical & Scientific Computing I ( ) Seminar on Computer Science I ( ) Parallel FEM
2 ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications with Automatic Tuning (AT) Kengo Nakajima Information Technology Center Masaki Satoh (AORI/U. Tokyo), Takashi Furumura (ERI/U. Tokyo) Hiroshi Okuda (GS Frontier Sciences/U. Tokyo), Takeshi Iwashita (ACCMS/Kyoto U.) Hide Sakaguchi (JAMSTEC)
3 3 Post T2K System PFLOPS, FY.2015 Many-core based (e.g. (only) Intel MIC/Xeon Phi) Joint Center for Advanced High Performance Computing (JCAHPC, University of Tsukuba University of Tokyo Programming is still difficult, although Intel compiler works. (MPI + OpenMP) Tuning for performance (e.g. prefetching) is essential Some framework for helping users needed
4 4 Key-Issues for Appl s/algorithms towards Post-Peta & Exa Computing Jack Dongarra (ORNL/U. Tennessee) at ISC 2013 Heterogeneous/Hybrid Architecture Communication/Synchronization Reducing Algorithms Mixed Precision Computation Auto-Tuning/Self-Adapting Fault Resilient Algorithms Reproducibility of Results
5 5 ppopen-hpc (1/3) Open Source Infrastructure for development and execution of large-scale scientific applications on postpeta-scale supercomputers with automatic tuning (AT) pp : post-peta-scale Five-year project (FY ) (since April 2011) P.I.: Kengo Nakajima (ITC, The University of Tokyo) Part of Development of System Software Technologies for Post-Peta Scale High Performance Computing funded by JST/CREST (Japan Science and Technology Agency, Core Research for Evolutional Science and Technology) 4.5 M$ for 5 yr. Team with 6 institutes, >30 people (5 PDs) from various fields: Co-Desigin ITC/U.Tokyo, AORI/U.Tokyo, ERI/U.Tokyo, FS/U.Tokyo Kyoto U., JAMSTEC
6 6
7 7 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)
8 8
9 ppopen-hpc covers 9
10 10 ppopen-appl A set of libraries corresponding to each of the five methods noted above (FEM, FDM, FVM, BEM, DEM), providing: I/O netcdf-based Interface Domain-to-Domain Communications Optimized Linear Solvers (Preconditioned Iterative Solvers) Optimized for each discretization method Matrix Assembling AMR and Dynamic Load Balancing
11 11 FEM Code on ppopen-hpc Optimization/parallelization could be hide from application developers Program My_pFEM use ppopenfem_util use ppopenfem_solver call ppopenfem_init call ppopenfem_cntl call ppopenfem_mesh call ppopenfem_mat_init do call Users_FEM_mat_ass call Users_FEM_mat_bc call ppopenfem_solve call ppopenfem_vis Time= Time + DT enddo call ppopenfem_finalize stop end
12 12 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)
13 13 ppopen-hpc (3/3) Capability of automatic tuning (AT) enables development of optimized codes and libraries on emerging architecture based on results by existing architectures and machine parameters. Mem. Access, Host/Co-Proc. Balance, Comp./Comm. Overlapping Solvers & Libraries in ppopen-hpc OpenFOAM, PETSc etc. Target system is Post T2K system PFLOPS, FY.2015 Many-core based (e.g. Intel MIC/Xeon Phi) ppopen-hpc helps smooth transition of users to new system
14 14
15 15 ppopen-math A set of common numerical libraries Multigrid solvers (ppopen-math/mg) Parallel graph libraries (ppopen-math/graph) Parallel visualization (ppopen-math/vis) Library for coupled multi-physics simulations (loosecoupling) (ppopen-math/mp) Originally developed as a coupler for NICAM (atmosphere, unstructured), and COCO (ocean, structured) in global climate simulations using K computer Both codes are major codes on the K computer.» Prof. Masaki Satoh (AORI/U.Tokyo): NICAM» Prof. Hiroyasu Hasumi (AORI/U.Tokyo): COCO Developed coupler is extended to more general use. Coupled seismic simulations
16 NICAM: Semi-Unstructured Grid MIROC-A: FDM/Structured Grid 16 NICAM- Agrid NICAM- ZMgrid Atmospheric Model-1 Atmospheric Model-2 MIROC-A J-cup ppopen-math/mp Coupler Ocean Model Grid Transformation Multi-Ensemble IO Pre- and post-process Fault tolerance M N Post-Peta-Scale System -System S/W -Architecture COCO Regional COCO Matsumuramodel COCO: Tri-Polar FDM Regional Ocean Model Non Hydrostatic Model c/o T.Arakawa, M.Satoh
17 Large-Scale Coupled Simulations in FY Challenge (FY2013) : A test of a coupling simulation of FDM (regular grid) and FEM (unconstructed grid) using newly developed ppopen MATH/MP Coupler c/o T.Furumura FDM: Seismic Wave Propagation Model size: 80x80x400 km Time: 240 s Resolution (space): 0.1 km (regular) Resolution (time) : 5 ms (effective freq.<1hz) FEM: Building Response Model size: 400x400x200 m Time: 60 s Resolution (space): 1 m Resolution (time) : 1 ms ppopen MATH/MP: Space temporal interpolation, Mapping between FDM and FEM mesh, etc.
18 Schedule of Public Release (with English Documents) We are now focusing on MIC/Xeon Phi 18 4Q 2012 (Ver.0.1.0) ppopen-hpc for Multicore Cluster (Cray, K etc.) Preliminary version of ppopen-at/static 4Q 2013 (Ver.0.2.0) ppopen-hpc for Multicore Cluster & Xeon Phi (& GPU) available in SC 13 4Q 2014 Prototype of ppopen-hpc for Post-Peta Scale System 4Q 2015 Final version of ppopen-hpc for Post-Peta Scale System Further optimization on the target system
19 19 ppopen-hpc Ver Released at SC12 (or can be downloaded) Multicore cluster version (Flat MPI, OpenMP/MPI Hybrid) with documents in English Collaborations with scientists Component Archive Flat MPI OpenMP/MPI C F ppopen APPL/FDM ppohfdm_0.1.0 ppopen APPL/FVM ppohfvm_0.1.0 ppopen APPL/FEM ppohfem_0.1.0 ppopen APPL/BEM ppohbem_0.1.0 ppopen APPL/DEM ppohdem_0.1.0 ppopen MATH/VIS ppohvis_fdm3d_0.1.0 ppopen AT/STATIC ppohat_0.1.0
20 20 What is new in Ver.0.2.0? Available in SC13 (or can be downloaded) Component New Development ppopen-appl/fdm OpenMP/MPI Hybrid Parallel Programming Model Intel Xeon/Phi Version Interface for ppopen-math/vis-fdm3d ppopen-appl/fvm Optimized Communication ppopen-appl/fem Sample Implementations for Dynamic Solid Mechanics ppopen-math/mp-pp Tool for Generation of Remapping Table in ppopen- MATH/MP ppopen-math/vis Optimized ppopen-math/vis-fdm3d ppopen-at/static Sequence of Statements, Loop Splitting (Optimized) ppopen-appl/fvm ppopen-appl/fdm BEM
21 21 ppopen-math/vis Parallel Visualization using Information of Background Voxels [Nakajima & Chen 2006] FDM version is released: ppopen-math/vis-fdm3d UCD single file Platform T2K,Cray FX10 Flat MPI Unstructured/Hybrid version Next release [Refine] AvailableMemory = 2.0 Available memory size (GB), not available in this version. MaxVoxelCount = 500 Maximum number of voxels MaxRefineLevel = 20 Maximum number of refinement levels
22 SDSC Simplified Parallel Visualization using Background Voxels Octree-based AMR AMR applied to the region where gradient of field values are large stress concentration, shock wave, separation etc. If the number of voxels are controled, a single file with 10 5 meshes is possible, even though entire problem size is 10 9 with distributed data sets.
23 SDSC FEM Mesh (SW Japan Model)
24 SDSC Voxel Mesh (initial)
25 SDSC Voxel Mesh (2-level adapted)
26 SDSC Example of Surface Simplification Initial (11,884 tri s) 50% reduction (5,942 ) 95% reduction (594) 98% reduction (238)
27 pfem-vis 27 pfem3d + ppopen-math/vis Files >$ cd <$O-fem2> >$ cp /home/z30088/fem2/c/fem3dv.tar. >$ tar xvf fem3dv.tar >$ cd <$O-fem2>/fem3dv/src >$ cd../run >$ pjsub go.sh
28 pfem-vis Makefile 28 PPOHVISDIR= /home/z30088/fem2/ppohvis_pfem3dstr_140115/ppohvis/ LIB_DIR = $(PPOHVISDIR)/lib INC_DIR = $(PPOHVISDIR)/include CFLAGSL = -I/home/z30088/ppohVIS_test/include LDFLAGSL = -L/home/z30088/ppohVIS_test/lib LIBSL = -lppohvisfdm3d OPTFLAGS = -Kfast -I$(INC_DIR) LFLAGS = -L$(LIB_DIR) $(LIBS) -lm #.SUFFIXES:.SUFFIXES:.o.c.c.o: $(CC) $(OPTFLAGS) -c $< -o $@ # TARGET =../run/sol default: $(TARGET) OBJS = test1_p.o pfem_init.o... $(TARGET): $(OBJS) $(CC) $(OPTFLAGS) -o $@ $(OBJS) $(LFLAGS) clean: /bin/rm -f *.o *~ $(TARGET)
29 pfem-vis 29 <$O-fem2>/fem3dv/run cube_32x32x32_8pe.0 cube_32x32x32_8pe.1 cube_32x32x32_8pe.2 cube_32x32x32_8pe.3 cube_32x32x32_8pe.4 cube_32x32x32_8pe.5 cube_32x32x32_8pe.6 cube_32x32x32_8pe.7 go.sh INPUT.DAT vis.cnt cube_32x32x32_8pe #!/bin/sh #PJM -L "node=1" #PJM -L "elapse=00:10:00" #PJM -L "rscgrp=lecture" #PJM -g "gt74" #PJM -j #PJM -o "cube8.lst" #PJM --mpi "proc=8" mpiexec./sol
30 pfem-vis 30 pfem3d + ppopen-math/vis INPUT.DAT <HEADER>.* sol with ppohvis vis.cnt Distributed Local Mesh Files vis_disp.1.inp vis_sigma-zz.1.inp ParaView Output
31 pfem-vis 31 #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" #include "ppohvis_pfem3d_util.h" C/main (1/2) extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON0(); extern void MAT_CON1(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; struct ppohvis_base_stcontrol *pcontrol=null; struct ppohvis_base_stresultcollection *pnoderesult=null; PFEM_INIT(argc,argv); ppohvis_pfem3d_init(mpi_comm_world); INPUT_CNTL(); pcontrol = ppohvis_pfem3d_getcontrol("vis.cnt"); INPUT_GRID(); if(ppohvis_pfem3d_setmeshex( NP,N,NODE_ID,XYZ, ICELTOT,ICELTOT_INT,ELEM_ID,ICELNOD, NEIBPETOT,NEIBPE,IMPORT_INDEX,IMPORT_ITEM,EXPORT_INDEX,EXPORT_ITEM)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); };
32 pfem-vis 32 C/main (2/2) SOLVE33(); RECOVER_STRESS(); pnoderesult=ppohvis_base_allocateresultcollection(); if(pnoderesult == NULL) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; pnoderesult->results[0]= ppohvis_pfem3d_convresultnodeitem(np,3,"disp",x); pnoderesult->results[1]= ppohvis_pfem3d_convresultnodeitempart(np,3,2, "sgm-zz",sigma_n); if(ppohvis_pfem3d_visualize(pnoderesult,null,pcontrol,"vis",1)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; ppohvis_pfem3d_finalize(); PFEM_FINALIZE() ;
33 pfem-vis vis.cnt 33 [Refine] Section for Refinement Control AvailableMemory = 2.0 (GB)not in use MaxVoxelCount = 1000 Max Voxel # MaxRefineLevel = 20 Max Voxel Refinement Level [Simple] Section for Simplification Control ReductionRate = 0.0 Reduction Rate of Surf. Patches 32,768 elements, 39,949 nodes 554 elements, 828 nodes Deformation Fig. not supported.
3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis
3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen HPC Open Source Infrastructure
More information3D Parallel FEM (III) Parallel Visualization and ppopen-hpc
3D Parallel FEM (III) Parallel Visualization and ppopen-hpc Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen-hpc 2 3 Framework for Parallel Visualization
More information3D Parallel FEM (III) Parallel Visualization and ppopen-hpc
3D Parallel FEM (III) Parallel Visualization and ppopen-hpc Kengo Nakajima Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Hybrid Distributed Parallel Computing
More informationKengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA
ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima Information Technology
More informationExtreme Simulations on ppopen-hpc
Extreme Simulations on ppopen-hpc Kengo Nakajima Information Technology Center The University of Tokyo Challenges in Extreme Engineering Algorithms ISC 16, June 20, 2016 Frankfurt am Main, Germany Summary
More informationKeywords ppopen-hpc Post-peta-scale systems Automatic tuning Parallel computing
ppopen-hpc: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima, Masaki Satoh, Takashi
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters
Introduction to Parallel Programming for Multicore/Manycore Clusters General Introduction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI
Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters
duction to Parallel Programming for Multi/Many Clusters General duction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri Information Technology
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Part II-5: Communication-Computation Overlapping and Some Other Issues
Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-5: Communication-Computation Overlapping and Some Other Issues Kengo Nakajima Information Technology Center The University of
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Introduction
duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima Information Technology Center The University of Tokyo 2 Motivation for Parallel Computing (and this class) Large-scale
More informationIntroduction of Oakforest-PACS
Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS
More informationIterative Linear Solvers for Sparse Matrices
Iterative Linear Solvers for Sparse Matrices Kengo Nakajima Information Technology Center, The University of Tokyo, Japan 2013 International Summer School on HPC Challenges in Computational Sciences New
More informationExperiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi
Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Kengo Nakajima Supercomputing Research Division Information Technology
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationCommunication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters
Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa Information Technology Center,
More information3D Parallel FEM (V) Communication-Computation Overlapping
3D Parallel FEM (V) Communication-Computation Overlapping Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) 2 z Uniform Distributed Force in z-direction
More informationMultigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10
Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX0 Kengo Nakajima Information Technology enter, The University of Tokyo, Japan November 4 th, 0 Fujitsu Booth S Salt Lake
More informationReport S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )
Report S1 C Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-3 Report S1 (2/2) Develop parallel program
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationHierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs
JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,
More informationOverview of Reedbush-U How to Login
Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationCommunication-Computation. 3D Parallel FEM (V) Overlapping. Kengo Nakajima
3D Parallel FEM (V) Communication-Computation Overlapping Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) U y =0 @ y=y min (N z -1) elements N z nodes
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationPreliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede
Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationHigh-performance Randomized Matrix Computations for Big Data Analytics and Applications
jh160025-dahi High-performance Randomized Matrix Computations for Big Data Analytics and Applications akahiro Katagiri (Nagoya University) Abstract In this project, we evaluate performance of randomized
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationThe ECMWF forecast model, quo vadis?
The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian
More informationAn evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks
An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract
More informationPARALUTION - a Library for Iterative Sparse Methods on CPU and GPU
- a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center
More informationThe challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.
The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1
More informationReport S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )
Report S1 C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) Problem S1-1 Report S1 (1/2) Read local files /a1.0~a1.3, /a2.0~a2.3. Develop
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationSimulating tsunami propagation on parallel computers using a hybrid software framework
Simulating tsunami propagation on parallel computers using a hybrid software framework Xing Simula Research Laboratory, Norway Department of Informatics, University of Oslo March 12, 2007 Outline Intro
More informationCHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer
CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical
More informationHigh-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration
High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Introduction
duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima & Tetsuya Hoshino Information Technology Center The University of Tokyo Motivation for Parallel Computing 2 (and
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationCOSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder,
COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, 8.9.2011 Supercomputing Systems AG Technopark 1 8005 Zürich 1 Fon +41 43 456 16 00 Fax +41 43 456 16 10 www.scs.ch Boulder, 8.9.2011, by
More informationReport S1 Fortran. Kengo Nakajima Information Technology Center
Report S1 Fortran Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-1 Report S1 Read local files /a1.0~a1.3,
More informationBasic Specification of Oakforest-PACS
Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters
duction to Parallel Programming for Multicore/Manycore Clusters duction Kengo Nakajima Information Technology Center The University of Tokyo http://nkl.cc.u-tokyo.ac.jp/18s/ 2 Descriptions of the Class
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationForest-of-octrees AMR: algorithms and interfaces
Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationGeneric finite element capabilities for forest-of-octrees AMR
Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität
More informationThe Icosahedral Nonhydrostatic (ICON) Model
The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =
More informationParallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.
Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic
More informationPerformance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla
Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationLinkage of XcalableMP and Python languages for high productivity on HPC cluster system
Linkage of XcalableMP and Python languages for high productivity on HPC cluster system Masahiro Nakao (RIKEN Center for Computational Science) 6th XMP Workshop@University of Tsukuba, Japan Background XcalableMP
More informationOptimization of finite-difference kernels on multi-core architectures for seismic applications
Optimization of finite-difference kernels on multi-core architectures for seismic applications V. Etienne 1, T. Tonellot 1, K. Akbudak 2, H. Ltaief 2, S. Kortas 3, T. Malas 4, P. Thierry 4, D. Keyes 2
More informationSupercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle
Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB
More informationHigh Performance Computing for PDE Towards Petascale Computing
High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund
More informationICOMEX: Icosahedral grid Models for Exascale Earth System Simulations
ICOMEX: Icosahedral grid Models for Exascale Earth System Simulations Early results of the G8 Exascale Projects Thomas Ludwig University of Hamburg/DKRZ, Günther Zängl Deutscher Wetterdienst, Masaki Satoh
More informationOmni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation
http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationIntegrating GPUs as fast co-processors into the existing parallel FE package FEAST
Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dipl.-Inform. Dominik Göddeke (dominik.goeddeke@math.uni-dortmund.de) Mathematics III: Applied Mathematics and Numerics
More informationOverlapping Computation and Communication for Advection on Hybrid Parallel Computers
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu
More informationFourteen years of Cactus Community
Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus
More informationHPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017
HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016
More informationAlgorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing
Algorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing Bo Kågström Department of Computing Science and High Performance Computing Center North (HPC2N) Umeå University,
More informationEfficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs
Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,
More informationAdvanced Computation and I/O Methods for Earth-System Simulations Status update
Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther
More informationA Multiscale Non-hydrostatic Atmospheric Model for Regional and Global Applications
A Multiscale Non-hydrostatic Atmospheric Model for Regional and Global Applications James D. Doyle 1, Frank Giraldo 2, Saša Gaberšek 1 1 Naval Research Laboratory, Monterey, CA, USA 2 Naval Postgraduate
More informationFOR P3: A monolithic multigrid FEM solver for fluid structure interaction
FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany
More informationOpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer
OpenACC Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries Compiler Directives Programming Languages Easy to use Most Performance
More informationPerformance and accuracy of hardware-oriented. native-, solvers in FEM simulations
Robert Strzodka, Stanford University Dominik Göddeke, Universität Dortmund Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations Number of slices
More informationWorkshop Agenda Feb 25 th 2015
Workshop Agenda Feb 25 th 2015 Time Presenter Title 09:30 T. König Talk bwhpc Concept & bwhpc-c5 - Federated User Support Activities 09:45 R. Walter Talk bwhpc architecture (bwunicluster, bwforcluster
More informationHigh Performance Computing for PDE Some numerical aspects of Petascale Computing
High Performance Computing for PDE Some numerical aspects of Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik,
More informationWhat is Cactus? Cactus is a framework for developing portable, modular applications
What is Cactus? Cactus is a framework for developing portable, modular applications What is Cactus? Cactus is a framework for developing portable, modular applications focusing, although not exclusively,
More informationVincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012
Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload
More informationAccelerated Earthquake Simulations
Accelerated Earthquake Simulations Alex Breuer Technische Universität München Germany 1 Acknowledgements Volkswagen Stiftung Project ASCETE: Advanced Simulation of Coupled Earthquake-Tsunami Events Bavarian
More informationAlgorithms, System and Data Centre Optimisation for Energy Efficient HPC
2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy
More informationHPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)
HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access
More informationANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011
ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationMaking Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010
Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationYasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors
Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo
More informationLecture 2 Unstructured Mesh Generation
Lecture 2 Unstructured Mesh Generation MIT 16.930 Advanced Topics in Numerical Methods for Partial Differential Equations Per-Olof Persson (persson@mit.edu) February 13, 2006 1 Mesh Generation Given a
More informationApplication Example Running on Top of GPI-Space Integrating D/C
Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant
More informationAACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-
AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science
More informationAchieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation
Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationSolving Large Complex Problems. Efficient and Smart Solutions for Large Models
Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationOpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2
2014@San Jose Shanghai Jiao Tong University Tokyo Institute of Technology OpenACC2 vs.openmp4 he Strong, the Weak, and the Missing to Develop Performance Portable Applica>ons on GPU and Xeon Phi James
More informationEfficient Multi-GPU CUDA Linear Solvers for OpenFOAM
Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,
More informationFramework for Separation of Concerns Between Application Requirements and System Requirements
1 Framework for Separation of Concerns Between Application Requirements and System Requirements SIAM Conference on Computational Science & Engineering 2015 Mar 18, 2015@Salt Lake City Convention Center
More informationIntroduction to OpenMP
1 / 7 Introduction to OpenMP: Exercises and Handout Introduction to OpenMP Christian Terboven Center for Computing and Communication, RWTH Aachen University Seffenter Weg 23, 52074 Aachen, Germany Abstract
More informationgpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques
gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department
More informationVisIt Libsim. An in-situ visualisation library
VisIt Libsim. An in-situ visualisation library December 2017 Jean M. Favre, CSCS Outline Motivations In-situ visualization In-situ processing strategies VisIt s libsim library Enable visualization in a
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationExperiences with ENZO on the Intel Many Integrated Core Architecture
Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and
More information