Extreme Simulations on ppopen-hpc

Size: px
Start display at page:

Download "Extreme Simulations on ppopen-hpc"

Transcription

1 Extreme Simulations on ppopen-hpc Kengo Nakajima Information Technology Center The University of Tokyo Challenges in Extreme Engineering Algorithms ISC 16, June 20, 2016 Frankfurt am Main, Germany

2 Summary ppopen-hpc is an open source infrastructure for development and execution of optimized and reliable simulation code on post-peta-scale (pp) parallel computers based on many-core architectures with automatic tuning (AT), and it consists of various types of libraries, which cover general procedures for scientific computation. Some Recent Activities Recent achievements in the development of ppopen- MATH/MG, which is a geometric multigrid solver Coupled earthquake simulations by ppopen-math/mp 2

3 ppopen-hpc ppopen-math ppopen-math/mg: Multigrid Solver Groundwater Flow ppopen-math/mp: Coupler Coupled Earthquake Simulations Future/On-going Works 3

4 ppopen-hpc: Overview Application framework with automatic tuning (AT) pp : post-peta-scale Five-year project (FY ) (since April 2011) P.I.: Kengo Nakajima (ITC, The University of Tokyo) Part of Development of System Software Technologies for Post-Peta Scale High Performance Computing funded by JST/CREST (Supervisor: Prof. Mitsuhisa Sato, RIKEN AICS) Target: Post T2K System (Original Schedule: FY.2015) could be extended to various types of platforms Team with 7 institutes, >50 people (5 PDs) from various fields: Co-Design ITC/U.Tokyo, AORI/U.Tokyo, ERI/U.Tokyo, FS/U.Tokyo 4 Hokkaido U., Kyoto U., JAMSTEC

5 Group Leaders Masaki Satoh (AORI/U.Tokyo) Takashi Furumura (ERI/U.Tokyo) Hiroshi Okuda (GSFS/U.Tokyo) Takeshi Iwashita (Kyoto U., ITC/Hokkaido U.) Hide Sakaguchi (IFREE/JAMSTEC) Main Members Takahiro Katagiri (ITC/U.Tokyo) Masaharu Matsumoto (ITC/U.Tokyo) Hideyuki Jitsumoto (Tokyo Tech) Satoshi Ohshima (ITC/U.Tokyo) Hiroyasu Hasumi (AORI/U.Tokyo) Takashi Arakawa (RIST) Futoshi Mori (ERI/U.Tokyo) Takeshi Kitayama (GSFS/U.Tokyo) Akihiro Ida (ACCMS/Kyoto U. -> University of Tokyo) Miki Yamamoto (IFREE/JAMSTEC) Daisuke Nishiura (IFREE/JAMSTEC) 5

6 6 Framework Appl. Dev. Math Libraries Automatic Tuning (AT) System Software

7 7 ppopen-hpc covers

8 Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY Hitachi SR11K/J2 IBM Power TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB Yayoi: Hitachi SR16000/M1 IBM Power TFLOPS, 11.2 TB Oakleaf FX: Fujitsu PRIMEHPC FX10, SPARC64 IXfx 1.13 PFLOPS, 150 TB Oakforest PACS Fujitsu, Intel KNL 25PFLOPS, 919.3TB JCAHPC: U.Tsukuba & U.Tokyo Post FX PFLOPS (?) 8 Oakbridge FX TFLOPS, 18.4 TB CSE & Big Data U.Tokyo s 1 st System with GPU s Reedbush, SGI Broadwell + Pascal PFLOPS Peta K K computer Post K

9 9 Post T2K System: Oakforest-PACS 25 PFLOPS, Fall 2016: by Fujitsu 8,208 Intel Xeon/Phi (KNL) Full operation starts on December 1, 2016 Joint Center for Advanced High Performance Computing (JCAHPC, University of Tsukuba University of Tokyo New system will installed in Kashiwa-no-Ha (Leaf of Oak) Campus/U.Tokyo, which is between Tokyo and Tsukuba Please visit #510 for more detailed information.

10 10 Schedule of Public Release (with English Documents, MIT License) Released at SC-XY (or can be downloaded) Multicore/manycore cluster version (Flat MPI, OpenMP/MPI Hybrid) with documents in English We are now focusing on MIC/Xeon Phi Collaborations are welcome History SC12, Nov 2012 (Ver.0.1.0) SC13, Nov 2013 (Ver.0.2.0) SC14, Nov 2014 (Ver.0.3.0) SC15, Nov 2015 (Ver.1.0.0)

11 11 New Features in Ver HACApK library for H-matrix comp. in ppopen- APPL/BEM (OpenMP/MPI Hybrid Version) First Open Source Library by OpenMP/MPI Hybrid ppopen-math/mp (Coupler for Multiphysics Simulations, Loose Coupling of FEM & FDM) Matrix Assembly and Linear Solvers for ppopen- APPL/FVM

12 12 Collaborations, Outreaching International Collaborations Lawrence Berkeley National Lab. National Taiwan University National Central University (Taiwan) ESSEX/SPPEXA/DFG, Germany IPCC(Intel Parallel Comp. Ctr.) Outreaching, Applications Large-Scale Simulations Geologic CO 2 Storage Astrophysics Earthquake Simulations etc. ppopen-at, ppopen-math/vis, ppopen-math/mp, Linear Solvers Intl. Workshops (2012,13,15) Tutorials, Classes

13 Example: Geologic CO 2 Storage 3D Multiphase Flow (Liquid/Gas) + 3D Mass Transfer 13 [Dr. Hajime Yamamoto, Taisei]

14 Density convections for 1,000 years: Flow Model Only the far side of the vertical cross section passing through the injection well is depicted. [Dr. Hajime Yamamoto, Taisei] The meter scale fingers gradually developed to larger ones in the field scale model Huge number of time steps (> 10 5 ) were required to complete the 1,000 yrs simulation Onset time (10 20 yrs) is comparable to theoretical (linear stability analysis, 15.5yrs) 14 COPYRIGHT TAISEI CORPORATION ALL RIGHTS RESERVED

15 Calculation Time (sec) Simulation of Geologic CO 2 Storage 30 million DoF (10 million grids 3 DoF/grid node) Average time for solving matrix for one time step Number of Processors TOUGH2 MP on FX times speedup [Dr. Hajime Yamamoto, Taisei] Fujitsu FX10(Oakleaf-FX),30M DOF: 2x-3x improvement 3D Multiphase Flow (Liquid/Gas) + 3D Mass Transfer COPYRIGHT TAISEI CORPORATION ALL RIGHTS RESERVED 15

16 16 Support FY : Post Peta CREST JST/CREST FY : ESSEX-II JST/CREST & DFG/SPPEXA (Germany) Collaboration ESSEX: Equipping Sparse Solvers for Exascale (FY ) Leading PI: Prof. Gerhard Wellein (U. Erlangen) ESSEX II (FY ) ESSEX + U.Tsukuba (Sakurai) + U. Tokyo (Nakajima)

17 ppopen-hpc ppopen-math ppopen-math/mg: Multigrid Solver ppopen-math/mp: Coupler ppopen-at Future/On-going Works 17

18 18

19 19 ppopen-math A set of common numerical libraries Multigrid solvers (ppopen-math/mg) Parallel graph libraries (ppopen-math/graph) Multithreaded RCM for reordering (under development) Parallel visualization (ppopen-math/vis) Library for coupled multi-physics simulations (loosecoupling) (ppopen-math/mp) Originally developed as a coupler for NICAM (atmosphere, unstructured), and COCO (ocean, structured) in global climate simulations using K computer Both codes are major codes on the K computer.» Prof. Masaki Satoh (AORI/U.Tokyo): NICAM» Prof. Hiroyasu Hasumi (AORI/U.Tokyo): COCO Developed coupler is extended to more general use. Coupled seismic simulations

20 Sparse Linear Solvers in ppopen-hpc 20 (OpenMP+MPI) Hybrid Multicoloring/RCM/CM-RCM for OpenMP Coloring procedures are NOT parallelized yet ppopen-appl/fem, FVM, FDM ILU/BILU(p,d,t)+CG/GPBiCG/GMRES, Depth of Overlapping Hierarchical Interface Decomposition (HID) [Henon & Saad 2007], Extended HID [KN 2010] ppopen-math/mg Geometric Multigrid Solvers/Preconditioners Comm./synch. avoiding/reducing based on hcga [KN 2014, Best Paper Award in IEEE/ICPADS 2014] ppopen-appl/bem H-Matrix Solver: HACApK Only Open-Source H-Matrix Solver Library by OpenMP/MPI

21 MEPA 2014 ppopen MATH/MG: pgw3d FVM 21 3D Groundwater Flow via Heterogeneous Porous Media Poisson s equation x, y, z q Randomly distributed water conductivity Finite Volume Method on Cubic Voxel Mesh =10 5 ~10 +5, Average: 1.00 MGCG Solver Storage format of coefficient matrices (Serial Comm.) CRS (Compressed Row Storage) ELL (Ellpack Itpack) Comm. /Sych. Reducing MG (Parallel Comm.) Coarse Grid Aggregation (CGA) Hierarchical CGA: Communication Reducing CGA

22 Preconditioned CG Method (Geometric) Multigrid Preconditioning (MGCG) IC(0) for Smoothing Operator (Smoother): good for illconditioned problems Linear Solvers Parallel Geometric Multigrid Method 8 fine meshes (children) form 1 coarse mesh (parent) in isotropic manner (octree) V-cycle Domain-Decomposition-based: Localized Block-Jacobi, Overlapped Additive Schwartz Domain Decomposition (ASDD) Operations using a single core at the coarsest level (redundant) 22

23 23 Optimization of Serial Comm. ELL (Ellpack-Itpack), Sliced-ELL for Matrix Storage (a) CRS (b) ELL Boundary Cells AUnew6(6,N) 0 0 separate arrays are introduced Pure Internal Cells AUnew3(3,N) A lot of X-ELL-Y-Z s! focusing on SpMV SELL-C- M. Kreutzer et al Recently, X-ELL-Y-Z s are applied to forward/backward substitutions with data dependency HPCG Gauss-Seidel: Easy ILU/IC Much more difficult than GS

24 MEPA Fine Original Approach (restriction) Coarse grid solver at a single core [KN 2010] Level=1 Level=2 Level=m-3 Level=m-2 Level=m-1 Level=m Mesh # for each MPI= 1 Coarse Coarse grid solver on a single core (further multigrid)

25 MEPA Fine Coarse Grid Aggregation (CGA) Coarse Grid Solver is multithreaded [KN 2012] Level=1 Level=2 Level=m-3 Level=m-2 Coarse Communication overhead could be reduced Coarse grid solver is more expensive than original approach. If process number is larger, this effect might be significant Coarse grid solver on a single MPI process (multi-threaded, further multigrid)

26 26 Computations on Fujitsu FX10 Fujitsu PRIMEHPC FX10 at U.Tokyo (Oakleaf-FX) Commercial version of K 16 cores/node, flat/uniform access to memory 4,800 nodes PF (65 th, TOP 500, 2015 Jun.) Up to 4,096 nodes (65,536 cores) (Large-Scale HPC Challenge) L1 L1 Max 17,179,869,184 unknowns C C Flat MPI, HB 4x4, HB 8x2, HB 16x1 Weak Scaling Strong Scaling = 16,777,216 unknowns, from 8 to 4,096 nodes Network Topology is not specified 1D L1 C L1 C L1 C L1 C L1 C Memory L2 L1 L1 C C L1 C L1 C L1 C L1 C L1 C L1 C L1 C

27 Weak Scaling: ~4,096 nodes Flat MPI up to 17,179,869,184 meshes (64 3 meshes/core) DOWN is GOOD sec Original ELL-CGA SELL-CGA Coarse grid solver is more expensive than original approach. If process number is larger, this effect might be significant CORE# 27

28 Hierarchical CGA: Comm. Reducing MG Fine Reduced number of MPI processes[kn 2014] Level=1 Level=2 Level=m 3 Level=m 3 Level=m 2 Coarse Coarse grid solver on a single MPI process (multithreaded, further multigrid) 28

29 hcga: Related Work Not a new idea, but very few implementations. Not effective for peta-scale systems (Dr. U.M.Yang (LLNL), developer of Hypre) Existing Works: Repartitioning at Coarse Levels Lin, P.T., Improving multigrid performance for unstructured mesh drift-diffusion simulations on 147,000 cores, International Journal for Numerical Methods in Engineering 91 (2012) (Sandia) Sundar, H. et al, Parallel Geometric-Algebraic Multigrid on Unstructured Forests of Octrees, ACM/IEEE Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC12) (2012) (UT Austin) Flat MPI, Repartitioning if DOF < O(10 3 ) on each process 29

30 Weak Scaling: ~4,096 nodes Flat MPI up to 17,179,869,184 meshes (64 3 meshes/core) sec Original ELL-CGA DOWN is GOOD SELL-CGA SELL-hCGA x CORE# 30

31 Strong Scaling:~4,096 nodes Flat MPI 100%: Efficiency for 128 cores 268,435,456 meshes, 16 3 meshes/core at 4,096 nodes Parallel Performance (%) UP is GOOD SELL+CGA SELL+hCGA x CORE# 31

32 Weak Scaling: ~4,096 nodes up to 17,179,869,184 meshes (64 3 meshes/core) DOWN is GOOD Matrix Coarse Grid C0 CRS Single Core C1 ELL (org) Single Core C2 ELL (org) CGA C3 ELL (sliced) CGA C4 ELL (sliced) hcga 15.0 x sec C3, 4,096 nodes C4, 4,096 nodes sec Flat MPI:C3 Flat MPI:C4 HB 4x4:C4 HB 8x2:C3 HB 16x1:C Flat MPI HB 4x4 HB 8x2 HB 16x CORE# 32

33 Summary hcga is effective (only for Flat MPI) Flat MPI: x1.61 for weak scaling, x6.27 for strong scaling at 4,096 nodes of Fujitsu FX10 hcga will be effective for HB 16x1 with more than 2.50x10 5 nodes (= 4.00x10 6 cores) of FX10 (=60 PFLOPS) Comp. time of coarse grid solver is significant for Flat MPI with >10 3 nodes Future Works Improvement of hcga CA-Multigrid (for coarser levels), CA-SPAI, Pipelined Method SELL-C- Strategy for Automatic Selection switching level, number of processes for hcga, optimum color # Post T2K (Oakforest-PACS) 33

34 ppopen-hpc ppopen-math ppopen-math/mg: Multigrid Solver ppopen-math/mp: Coupler ppopen-at Future/On-going Works 34

35 35 ppopen-math A set of common numerical libraries Multigrid solvers (ppopen-math/mg) Parallel graph libraries (ppopen-math/graph) Multithreaded RCM for reordering (under development) Parallel visualization (ppopen-math/vis) Library for coupled multi-physics simulations (loosecoupling) (ppopen-math/mp) Originally developed as a coupler for NICAM (atmosphere, unstructured), and COCO (ocean, structured) in global climate simulations using K computer Both codes are major codes on the K computer.» Prof. Masaki Satoh (AORI/U.Tokyo): NICAM» Prof. Hiroyasu Hasumi (AORI/U.Tokyo): COCO Developed coupler is extended to more general use. Coupled seismic simulations

36 NICAM: Semi-Unstructured Grid MIROC-A: FDM/Structured Grid 36 NICAM- Agrid NICAM- ZMgrid Atmospheric Model-1 Atmospheric Model-2 MIROC-A J-cup ppopen-math/mp Coupler Ocean Model Grid Transformation Multi-Ensemble IO Pre- and post-process Fault tolerance M N Post-Peta-Scale System -System S/W -Architecture COCO Regional COCO Matsumuramodel COCO: Tri-Polar FDM Regional Ocean Model Non Hydrostatic Model c/o T.Arakawa, M.Satoh

37 c/o T.Arakawa, M.Satoh 37 Sea surface temperature (OSST) left:coco (Ocean: Structured), right:nicam (Atmospheric: Semi-Unst.)

38 c/o T.Arakawa, M.Satoh 38 Thickness of Sea Ice (OHI) left:coco (Ocean: Structured), right:nicam (Atmospheric: Semi-Unst.)

39 Weak-Coupled Simulation by the ppopen-hpc Libraries Two kinds of applications (Seism3D+ based on FDM, and FrontISTR++ based on FEM) are connected by the ppopen-math/mp coupler. Seism3D+ FrontISTR++ ppopen-appl/fdm ppopen-appl/fem Velocity ppopen-math/mp Displacement Principal Functions Make a mapping table Convert physical variables Choose a timing of data transmission 39

40 Dataflow of ppopen-math/mp* 40

41 Coupling Simulation of Seismic Waves and Building Vibrations The coupling simulation refers to one-way data communication from FDM (seismic wave propagation) to FEM (dynamic structure). 41

42 Numerical Model Description 42

43 Implementation of the Coupling Simulation 43

44 44 Practical Simulation on Oakleaf-FX The simulation target is the earthquake that occurred at Awaji Island on 13 April, The computational domain of Seism3D+ is 60 km 2 from Awaji Island and that of FrontISTR++ is the actual building of RIKEN Advanced Institute for Computational Science (AICS), Port Island, Kobe, modeled by an unstructured mesh. Seism3D+ Grid Points Parallelization (x, y, z)=(1536, 1536, 1600) 2560 processes/16 threads FrontISTR++ Grid Points Parallelization 600 million(aics building) 1000 processes/16 threads (@Port Island) 1000 processes/16 threads (@Kobe Stadium) Total 4560 nodes on Oakleaf-FX (Seism3D+: 2560 nodes FrontISTR++: 2000 nodes) Computational domain of Seism3D+

45 2,560 nodes for FDM, 2,000 nodes for FEM = 4,560 nodes of FX10 from east to west to underground Geological layer Seismic wave propagation by Seism3D+ (Red:P-wave, Green:S-wave) Building vibration by FrontISTR++ 45

46 46 2,560 nodes for FDM, 2,000 nodes for FEM = 4,560 nodes of FX10 Coupling simulation was executed on large-scale computational resources of Oakleaf-FX supercomputer system. Seismic wave propagations (Seism3D+) for the simulation time of 90 sec., and building vibrations (FrontISTR++) for the simulation time of 20 sec. were calculated. Comparison between sim. time and exe. time Sim. Time Exe. Time Seism3D+ 90 sec. 6 hours FrontISTR++ 20 sec. 16 hours It was revealed that the manner in which memory allocation occurs in the coupler has some problem when such a large-scale simulation is performed.

47 47 Summary The implementation of a multi-scale coupling simulation of seismic waves and building vibrations was introduced and a performance evaluation was conducted. Finally, a practical simulation on the Oakleaf-FX was performed. Although more investigations need to be conducted, such a multi-scale coupling simulation will be important for numerical simulations on next generation of large-scale computational environments.

48 ppopen-hpc ppopen-math ppopen-math/mg: Multigrid Solver ppopen-math/mp: Coupler ppopen-at Future/On-going Works 48

49 ESSEX-II: 2 nd Phase of ppopen-hpc Japan-German Collaboration (FY ) ESSEX + Tsukuba (Prof. Sakurai) + U.Tokyo (ppopen- HPC) Leading PI: Prof. Gerhard Wellein (U. Erlangen) Preconditioned Iterative Solver for Quantum Science DLR (German Aerospace Research Center) Automatic Tuning with Performance Model U. Erlangen 49

50 Assumptions & Expectations 50 towards Post-K/Post-Moore Era Post-K (-2020) Memory Wall Hierarchical Memory (e.g. KNL: MCDRAM-DDR) Post-Moore (-2025? -2029?) High bandwidth in memory and network, Large capacity of memory and cache Large and heterogeneous latency due to deep hierarchy in memory and network Utilization of FPGA Common Issues Hierarchy, Latency (Memory, Network etc.) Large Number of Nodes, High concurrency with O(10 3 ) threads on each node under certain constraints (e.g. power, space )

51 51 CAE Applications in Post-K Era Block-structured AMR, Voxel-type FEM Semi-structured, Semi-unstructured Length of inner loops is fixed, or its variation is rather smaller (ELL, Sliced ELL, SELL-C- ) can be applied Better than fully unstructured FEM Robust/Scalable GMG/AMG Preconditioning Limited or fixed number of non-zero off-diagonals of coefficient matrices CRS ELL Sliced ELL SELL-C- (SELL-2-8) C

52 52 App s & Alg s in Post-Moore Era Compute Intensity -> Data Movement Intensity Implicit scheme strikes back!: but not straightforward Hierarchical Methods for Hiding Latency Hierarchical Coarse Grid Aggregation (hcga) in Multigrid Parallel in Space/Time (PiST) Comm./Synch. Avoiding/Reducing Algorithms Network latency is already a big bottleneck for parallel sparse linear solvers (SpMV, Dot Products) Pipelined/Asynchronous CG: MPI_Iallreduce in MPI-3 Power-aware Methods Approximate Computing, Power Management, FPGA

53 Framework for Appl. Development 53 in Post-Moore Era ppopen-hpc (FY JST/CREST)+α Linear Solvers, AT framework, Space/Time Tiling Pre/Post Processor for Parallel-in-Space/Time(PiST) (a) Nonlinear Algorithm (b) AMR, (c) Visualization, (d) Coupler for Multiphysics Data-Movement, Challenging!!

54 54 Summary: Visit #510 ppopen-hpc Porting to Oakforest-PACS (original target system) ppopen-math/mg ppopen-math/mp Coupled Earthquake Simulations ESSEX-II Post-K/Post Moore Issues Related Talk Akihiro Ida (U.Tokyo), Application of Hierarchical Matrices with Adaptive Cross Approximation to Large-Scale Simulations Wednesday, June 22, 2016, 04:10-04:30 pm, Panorama 2, Forum Algorithms for Extreme Scale in Practice (03:30-04:30 pm) Chair: David Keyes,

55 16 th SIAM Conference on 55 Parallel Processing for Scientific Computing (PP18) March 7-10, 2018, Tokyo, Japan Venue Nishiwaseda Campus, Waseda University (near Shinjuku) Organizing Committee Co-Chairs s Satoshi Matsuoka (Tokyo Institute of Technology, Japan) Kengo Nakajima (The University of Tokyo, Japan) Olaf Schenk (Universita della Svizzera italiana, Switzerland) Contact Kengo Nakajima, nakajima(at)cc.u-tokyo.ac.jp

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima Information Technology

More information

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Technical & Scientific Computing I (4820-1027) Seminar on Computer Science I (4810-1204) Parallel FEM ppopen-hpc Open Source

More information

Keywords ppopen-hpc Post-peta-scale systems Automatic tuning Parallel computing

Keywords ppopen-hpc Post-peta-scale systems Automatic tuning Parallel computing ppopen-hpc: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima, Masaki Satoh, Takashi

More information

Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10

Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10 Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX0 Kengo Nakajima Information Technology enter, The University of Tokyo, Japan November 4 th, 0 Fujitsu Booth S Salt Lake

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction to Parallel Programming for Multicore/Manycore Clusters General Introduction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri

More information

Iterative Linear Solvers for Sparse Matrices

Iterative Linear Solvers for Sparse Matrices Iterative Linear Solvers for Sparse Matrices Kengo Nakajima Information Technology Center, The University of Tokyo, Japan 2013 International Summer School on HPC Challenges in Computational Sciences New

More information

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa Information Technology Center,

More information

Overview of Reedbush-U How to Login

Overview of Reedbush-U How to Login Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15

More information

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Kengo Nakajima Supercomputing Research Division Information Technology

More information

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen HPC Open Source Infrastructure

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multi/Many Clusters General duction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri Information Technology

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Accelerated Computing Activities at ITC/University of Tokyo

Accelerated Computing Activities at ITC/University of Tokyo Accelerated Computing Activities at ITC/University of Tokyo Kengo Nakajima Information Technology Center (ITC) The University of Tokyo ADAC-3 Workshop January 25-27 2017, Kashiwa, Japan Three Major Campuses

More information

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima Information Technology Center The University of Tokyo 2 Motivation for Parallel Computing (and this class) Large-scale

More information

Large scale Imaging on Current Many- Core Platforms

Large scale Imaging on Current Many- Core Platforms Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

Fujitsu s Technologies to the K Computer

Fujitsu s Technologies to the K Computer Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The

More information

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology

Exploiting GPU Caches in Sparse Matrix Vector Multiplication. Yusuke Nagasaka Tokyo Institute of Technology Exploiting GPU Caches in Sparse Matrix Vector Multiplication Yusuke Nagasaka Tokyo Institute of Technology Sparse Matrix Generated by FEM, being as the graph data Often require solving sparse linear equation

More information

Reconstruction of Trees from Laser Scan Data and further Simulation Topics

Reconstruction of Trees from Laser Scan Data and further Simulation Topics Reconstruction of Trees from Laser Scan Data and further Simulation Topics Helmholtz-Research Center, Munich Daniel Ritter http://www10.informatik.uni-erlangen.de Overview 1. Introduction of the Chair

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Overview of Reedbush-U How to Login

Overview of Reedbush-U How to Login Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 11 12 13 14 15 16 17 18

More information

CHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer

CHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical

More information

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

Introduction to Multigrid and its Parallelization

Introduction to Multigrid and its Parallelization Introduction to Multigrid and its Parallelization! Thomas D. Economon Lecture 14a May 28, 2014 Announcements 2 HW 1 & 2 have been returned. Any questions? Final projects are due June 11, 5 pm. If you are

More information

Lecture 15: More Iterative Ideas

Lecture 15: More Iterative Ideas Lecture 15: More Iterative Ideas David Bindel 15 Mar 2010 Logistics HW 2 due! Some notes on HW 2. Where we are / where we re going More iterative ideas. Intro to HW 3. More HW 2 notes See solution code!

More information

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND

SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND Student Submission for the 5 th OpenFOAM User Conference 2017, Wiesbaden - Germany: SELECTIVE ALGEBRAIC MULTIGRID IN FOAM-EXTEND TESSA UROIĆ Faculty of Mechanical Engineering and Naval Architecture, Ivana

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs

Distributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship

More information

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI

EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI EFFICIENT SOLVER FOR LINEAR ALGEBRAIC EQUATIONS ON PARALLEL ARCHITECTURE USING MPI 1 Akshay N. Panajwar, 2 Prof.M.A.Shah Department of Computer Science and Engineering, Walchand College of Engineering,

More information

smooth coefficients H. Köstler, U. Rüde

smooth coefficients H. Köstler, U. Rüde A robust multigrid solver for the optical flow problem with non- smooth coefficients H. Köstler, U. Rüde Overview Optical Flow Problem Data term and various regularizers A Robust Multigrid Solver Galerkin

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

OpenMP/MPI Hybrid vs. Flat MPI on the Earth Simulator: Parallel Iterative Solvers for Finite Element Method. RIST/TOKYO GeoFEM Report.

OpenMP/MPI Hybrid vs. Flat MPI on the Earth Simulator: Parallel Iterative Solvers for Finite Element Method. RIST/TOKYO GeoFEM Report. GeoFEM 2003-007 RIST/TOKYO GeoFEM Report Research Organization for Information Science & Technology Kengo Nakajima OpenMP/MPI Hybrid vs. Flat MPI on the Earth Simulator: Parallel Iterative Solvers for

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

Accelerated Data and Computing Workshop

Accelerated Data and Computing Workshop Accelerated Data and Computing Workshop Organized by Tokyo Institute of Technology, Hosted by the University of Tokyo Kashiwa, Japan January 25-27, 2017 Accelerated Data and Computing Workshop Jan 25-27,

More information

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Current and Future Challenges of the Tofu Interconnect for Emerging Applications

Current and Future Challenges of the Tofu Interconnect for Emerging Applications Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Generic finite element capabilities for forest-of-octrees AMR

Generic finite element capabilities for forest-of-octrees AMR Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.

More information

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar

More information

Accelerated Earthquake Simulations

Accelerated Earthquake Simulations Accelerated Earthquake Simulations Alex Breuer Technische Universität München Germany 1 Acknowledgements Volkswagen Stiftung Project ASCETE: Advanced Simulation of Coupled Earthquake-Tsunami Events Bavarian

More information

τ-extrapolation on 3D semi-structured finite element meshes

τ-extrapolation on 3D semi-structured finite element meshes τ-extrapolation on 3D semi-structured finite element meshes European Multi-Grid Conference EMG 2010 Björn Gmeiner Joint work with: Tobias Gradl, Ulrich Rüde September, 2010 Contents The HHG Framework τ-extrapolation

More information

HPC future trends from a science perspective

HPC future trends from a science perspective HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively

More information

High-performance Randomized Matrix Computations for Big Data Analytics and Applications

High-performance Randomized Matrix Computations for Big Data Analytics and Applications jh160025-dahi High-performance Randomized Matrix Computations for Big Data Analytics and Applications akahiro Katagiri (Nagoya University) Abstract In this project, we evaluate performance of randomized

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

Exploring unstructured Poisson solvers for FDS

Exploring unstructured Poisson solvers for FDS Exploring unstructured Poisson solvers for FDS Dr. Susanne Kilian hhpberlin - Ingenieure für Brandschutz 10245 Berlin - Germany Agenda 1 Discretization of Poisson- Löser 2 Solvers for 3 Numerical Tests

More information

Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs

Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs USA November 14 th 2017 Legal Disclaimer & Optimization

More information

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dipl.-Inform. Dominik Göddeke (dominik.goeddeke@math.uni-dortmund.de) Mathematics III: Applied Mathematics and Numerics

More information

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA

3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA 3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires

More information

Forest-of-octrees AMR: algorithms and interfaces

Forest-of-octrees AMR: algorithms and interfaces Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI

Efficient AMG on Hybrid GPU Clusters. ScicomP Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann. Fraunhofer SCAI Efficient AMG on Hybrid GPU Clusters ScicomP 2012 Jiri Kraus, Malte Förster, Thomas Brandes, Thomas Soddemann Fraunhofer SCAI Illustration: Darin McInnis Motivation Sparse iterative solvers benefit from

More information

The Architecture and the Application Performance of the Earth Simulator

The Architecture and the Application Performance of the Earth Simulator The Architecture and the Application Performance of the Earth Simulator Ken ichi Itakura (JAMSTEC) http://www.jamstec.go.jp 15 Dec., 2011 ICTS-TIFR Discussion Meeting-2011 1 Location of Earth Simulator

More information

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014

Automatic Generation of Algorithms and Data Structures for Geometric Multigrid. Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Automatic Generation of Algorithms and Data Structures for Geometric Multigrid Harald Köstler, Sebastian Kuckuk Siam Parallel Processing 02/21/2014 Introduction Multigrid Goal: Solve a partial differential

More information

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle

Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of Earth s Mantle ICES Student Forum The University of Texas at Austin, USA November 4, 204 Parallel High-Order Geometric Multigrid Methods on Adaptive Meshes for Highly Heterogeneous Nonlinear Stokes Flow Simulations of

More information

Flat MPI and Hybrid Parallel Programming Models for FEM Applications on SMP Cluster Architectures

Flat MPI and Hybrid Parallel Programming Models for FEM Applications on SMP Cluster Architectures Flat MPI and Hybrid Parallel Programming Models for FEM Applications on SMP Cluster Architectures Kengo Nakajima The 21st Century Earth Science COE Program Department of Earth & Planetary Science The University

More information

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015

AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative

More information

Workshop on Efficient Solvers in Biomedical Applications, Graz, July 2-5, 2012

Workshop on Efficient Solvers in Biomedical Applications, Graz, July 2-5, 2012 Workshop on Efficient Solvers in Biomedical Applications, Graz, July 2-5, 2012 This work was performed under the auspices of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence

More information

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Iterative Solvers Numerical Results Conclusion and outlook 1/18 Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs Eike Hermann Müller, Robert Scheichl, Eero Vainikko

More information

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation

Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Nikolai Sakharnykh - NVIDIA San Jose Convention Center, San Jose, CA September 21, 2010 Introduction Tridiagonal solvers very popular

More information

simulation framework for piecewise regular grids

simulation framework for piecewise regular grids WALBERLA, an ultra-scalable multiphysics simulation framework for piecewise regular grids ParCo 2015, Edinburgh September 3rd, 2015 Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler

More information

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics

Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics

More information

GPU Cluster Computing for FEM

GPU Cluster Computing for FEM GPU Cluster Computing for FEM Dominik Göddeke Sven H.M. Buijssen, Hilmar Wobker and Stefan Turek Angewandte Mathematik und Numerik TU Dortmund, Germany dominik.goeddeke@math.tu-dortmund.de GPU Computing

More information

Implementation of d-spline-based incremental performance parameter estimation method with ppopen-at

Implementation of d-spline-based incremental performance parameter estimation method with ppopen-at Scientific Programming 22 (2014) 299 307 299 DOI 10.3233/SPR-140395 IOS Press Implementation of d-spline-based incremental performance parameter estimation method with ppopen-at Teruo Tanaka a,, Ryo Otsuka

More information

High Performance Computing for PDE Towards Petascale Computing

High Performance Computing for PDE Towards Petascale Computing High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima & Tetsuya Hoshino Information Technology Center The University of Tokyo Motivation for Parallel Computing 2 (and

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multicore/Manycore Clusters duction Kengo Nakajima Information Technology Center The University of Tokyo http://nkl.cc.u-tokyo.ac.jp/18s/ 2 Descriptions of the Class

More information

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling

Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Iterative Solvers Numerical Results Conclusion and outlook 1/22 Efficient multigrid solvers for strongly anisotropic PDEs in atmospheric modelling Part II: GPU Implementation and Scaling on Titan Eike

More information

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program

Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program Early Experiences with Trinity - The First Advanced Technology Platform for the ASC Program C.T. Vaughan, D.C. Dinge, P.T. Lin, S.D. Hammond, J. Cook, C. R. Trott, A.M. Agelastos, D.M. Pase, R.E. Benner,

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Multilevel Methods for Forward and Inverse Ice Sheet Modeling

Multilevel Methods for Forward and Inverse Ice Sheet Modeling Multilevel Methods for Forward and Inverse Ice Sheet Modeling Tobin Isaac Institute for Computational Engineering & Sciences The University of Texas at Austin SIAM CSE 2015 Salt Lake City, Utah τ 2 T.

More information

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs

3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs 3D Helmholtz Krylov Solver Preconditioned by a Shifted Laplace Multigrid Method on Multi-GPUs H. Knibbe, C. W. Oosterlee, C. Vuik Abstract We are focusing on an iterative solver for the three-dimensional

More information

Peta-Scale Simulations with the HPC Software Framework walberla:

Peta-Scale Simulations with the HPC Software Framework walberla: Peta-Scale Simulations with the HPC Software Framework walberla: Massively Parallel AMR for the Lattice Boltzmann Method SIAM PP 2016, Paris April 15, 2016 Florian Schornbaum, Christian Godenschwager,

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

The ECMWF forecast model, quo vadis?

The ECMWF forecast model, quo vadis? The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian

More information

High Performance Computing for PDE Some numerical aspects of Petascale Computing

High Performance Computing for PDE Some numerical aspects of Petascale Computing High Performance Computing for PDE Some numerical aspects of Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik,

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

Performance of Implicit Solver Strategies on GPUs

Performance of Implicit Solver Strategies on GPUs 9. LS-DYNA Forum, Bamberg 2010 IT / Performance Performance of Implicit Solver Strategies on GPUs Prof. Dr. Uli Göhner DYNAmore GmbH Stuttgart, Germany Abstract: The increasing power of GPUs can be used

More information

Hierarchical Hybrid Grids

Hierarchical Hybrid Grids Hierarchical Hybrid Grids IDK Summer School 2012 Björn Gmeiner, Ulrich Rüde July, 2012 Contents Mantle convection Hierarchical Hybrid Grids Smoothers Geometric approximation Performance modeling 2 Mantle

More information

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs jh170057-nahi Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H 2 -matrix,

More information

Recent developments for the multigrid scheme of the DLR TAU-Code

Recent developments for the multigrid scheme of the DLR TAU-Code www.dlr.de Chart 1 > 21st NIA CFD Seminar > Axel Schwöppe Recent development s for the multigrid scheme of the DLR TAU-Code > Apr 11, 2013 Recent developments for the multigrid scheme of the DLR TAU-Code

More information

Reducing Communication Costs Associated with Parallel Algebraic Multigrid

Reducing Communication Costs Associated with Parallel Algebraic Multigrid Reducing Communication Costs Associated with Parallel Algebraic Multigrid Amanda Bienz, Luke Olson (Advisor) University of Illinois at Urbana-Champaign Urbana, IL 11 I. PROBLEM AND MOTIVATION Algebraic

More information

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction

Scalability of Elliptic Solvers in NWP. Weather and Climate- Prediction Background Scaling results Tensor product geometric multigrid Summary and Outlook 1/21 Scalability of Elliptic Solvers in Numerical Weather and Climate- Prediction Eike Hermann Müller, Robert Scheichl

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Architecture Aware Multigrid

Architecture Aware Multigrid Architecture Aware Multigrid U. Rüde (LSS Erlangen, ruede@cs.fau.de) joint work with D. Ritter, T. Gradl, M. Stürmer, H. Köstler, J. Treibig and many more students Lehrstuhl für Informatik 10 (Systemsimulation)

More information

Application level asynchronous check-pointing / restart: first experiences with GPI

Application level asynchronous check-pointing / restart: first experiences with GPI ERLANGEN REGIONAL COMPUTING CENTER Application level asynchronous check-pointing / restart: first experiences with GPI Faisal Shahzad and Gerhard Wellein Dagstuhl Seminar Resilience in Exascale Computing

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

3D FEM for Solid Mechanics on Knights Landing and OmniPath Architecture

3D FEM for Solid Mechanics on Knights Landing and OmniPath Architecture 3D FEM for Solid Mechanics on Knights Landing and OmniPath Architecture Toshihiro Hanawa Kengo Nakajima Satoshi Ohshima Informa8on Technology Center, The University of Tokyo U y =0 @ y=y min (N z -1) elements

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction ICON for HD(CP) 2 High Definition Clouds and Precipitation for Advancing Climate Prediction High Definition Clouds and Precipitation for Advancing Climate Prediction ICON 2 years ago Parameterize shallow

More information

Simulating tsunami propagation on parallel computers using a hybrid software framework

Simulating tsunami propagation on parallel computers using a hybrid software framework Simulating tsunami propagation on parallel computers using a hybrid software framework Xing Simula Research Laboratory, Norway Department of Informatics, University of Oslo March 12, 2007 Outline Intro

More information