3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

Size: px
Start display at page:

Download "3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis"

Transcription

1 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Technical & Scientific Computing I ( ) Seminar on Computer Science I ( ) Parallel FEM

2 ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications with Automatic Tuning (AT) Kengo Nakajima Information Technology Center Masaki Satoh (AORI/U. Tokyo), Takashi Furumura (ERI/U. Tokyo) Hiroshi Okuda (GS Frontier Sciences/U. Tokyo), Takeshi Iwashita (ACCMS/Kyoto U.) Hide Sakaguchi (JAMSTEC)

3 3 Post T2K System PFLOPS, FY.2015 Many-core based (e.g. (only) Intel MIC/Xeon Phi) Joint Center for Advanced High Performance Computing (JCAHPC, University of Tsukuba University of Tokyo Programming is still difficult, although Intel compiler works. (MPI + OpenMP) Tuning for performance (e.g. prefetching) is essential Some framework for helping users needed

4 4 Key-Issues for Appl s/algorithms towards Post-Peta & Exa Computing Jack Dongarra (ORNL/U. Tennessee) at ISC 2013 Heterogeneous/Hybrid Architecture Communication/Synchronization Reducing Algorithms Mixed Precision Computation Auto-Tuning/Self-Adapting Fault Resilient Algorithms Reproducibility of Results

5 5 ppopen-hpc (1/3) Open Source Infrastructure for development and execution of large-scale scientific applications on postpeta-scale supercomputers with automatic tuning (AT) pp : post-peta-scale Five-year project (FY ) (since April 2011) P.I.: Kengo Nakajima (ITC, The University of Tokyo) Part of Development of System Software Technologies for Post-Peta Scale High Performance Computing funded by JST/CREST (Japan Science and Technology Agency, Core Research for Evolutional Science and Technology) 4.5 M$ for 5 yr. Team with 6 institutes, >30 people (5 PDs) from various fields: Co-Desigin ITC/U.Tokyo, AORI/U.Tokyo, ERI/U.Tokyo, FS/U.Tokyo Kyoto U., JAMSTEC

6 6

7 7 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)

8 8

9 ppopen-hpc covers 9

10 10 ppopen-appl A set of libraries corresponding to each of the five methods noted above (FEM, FDM, FVM, BEM, DEM), providing: I/O netcdf-based Interface Domain-to-Domain Communications Optimized Linear Solvers (Preconditioned Iterative Solvers) Optimized for each discretization method Matrix Assembling AMR and Dynamic Load Balancing

11 11 FEM Code on ppopen-hpc Optimization/parallelization could be hide from application developers Program My_pFEM use ppopenfem_util use ppopenfem_solver call ppopenfem_init call ppopenfem_cntl call ppopenfem_mesh call ppopenfem_mat_init do call Users_FEM_mat_ass call Users_FEM_mat_bc call ppopenfem_solve call ppopenfem_vis Time= Time + DT enddo call ppopenfem_finalize stop end

12 12 ppopen-hpc (2/3) ppopen-hpc consists of various types of optimized libraries, which covers various types of procedures for scientific computations. ppopen-appl/fem, FDM, FVM, BEM, DEM Source code developed on a PC with a single processor is linked with these libraries, and generated parallel code is optimized for post-peta scale system. Users don t have to worry about optimization tuning, parallelization etc. Part of MPI, OpenMP (OpenACC)

13 13 ppopen-hpc (3/3) Capability of automatic tuning (AT) enables development of optimized codes and libraries on emerging architecture based on results by existing architectures and machine parameters. Mem. Access, Host/Co-Proc. Balance, Comp./Comm. Overlapping Solvers & Libraries in ppopen-hpc OpenFOAM, PETSc etc. Target system is Post T2K system PFLOPS, FY.2015 Many-core based (e.g. Intel MIC/Xeon Phi) ppopen-hpc helps smooth transition of users to new system

14 14

15 15 ppopen-math A set of common numerical libraries Multigrid solvers (ppopen-math/mg) Parallel graph libraries (ppopen-math/graph) Parallel visualization (ppopen-math/vis) Library for coupled multi-physics simulations (loosecoupling) (ppopen-math/mp) Originally developed as a coupler for NICAM (atmosphere, unstructured), and COCO (ocean, structured) in global climate simulations using K computer Both codes are major codes on the K computer.» Prof. Masaki Satoh (AORI/U.Tokyo): NICAM» Prof. Hiroyasu Hasumi (AORI/U.Tokyo): COCO Developed coupler is extended to more general use. Coupled seismic simulations

16 NICAM: Semi-Unstructured Grid MIROC-A: FDM/Structured Grid 16 NICAM- Agrid NICAM- ZMgrid Atmospheric Model-1 Atmospheric Model-2 MIROC-A J-cup ppopen-math/mp Coupler Ocean Model Grid Transformation Multi-Ensemble IO Pre- and post-process Fault tolerance M N Post-Peta-Scale System -System S/W -Architecture COCO Regional COCO Matsumuramodel COCO: Tri-Polar FDM Regional Ocean Model Non Hydrostatic Model c/o T.Arakawa, M.Satoh

17 Large-Scale Coupled Simulations in FY Challenge (FY2013) : A test of a coupling simulation of FDM (regular grid) and FEM (unconstructed grid) using newly developed ppopen MATH/MP Coupler c/o T.Furumura FDM: Seismic Wave Propagation Model size: 80x80x400 km Time: 240 s Resolution (space): 0.1 km (regular) Resolution (time) : 5 ms (effective freq.<1hz) FEM: Building Response Model size: 400x400x200 m Time: 60 s Resolution (space): 1 m Resolution (time) : 1 ms ppopen MATH/MP: Space temporal interpolation, Mapping between FDM and FEM mesh, etc.

18 Schedule of Public Release (with English Documents) We are now focusing on MIC/Xeon Phi 18 4Q 2012 (Ver.0.1.0) ppopen-hpc for Multicore Cluster (Cray, K etc.) Preliminary version of ppopen-at/static 4Q 2013 (Ver.0.2.0) ppopen-hpc for Multicore Cluster & Xeon Phi (& GPU) available in SC 13 4Q 2014 Prototype of ppopen-hpc for Post-Peta Scale System 4Q 2015 Final version of ppopen-hpc for Post-Peta Scale System Further optimization on the target system

19 19 ppopen-hpc Ver Released at SC12 (or can be downloaded) Multicore cluster version (Flat MPI, OpenMP/MPI Hybrid) with documents in English Collaborations with scientists Component Archive Flat MPI OpenMP/MPI C F ppopen APPL/FDM ppohfdm_0.1.0 ppopen APPL/FVM ppohfvm_0.1.0 ppopen APPL/FEM ppohfem_0.1.0 ppopen APPL/BEM ppohbem_0.1.0 ppopen APPL/DEM ppohdem_0.1.0 ppopen MATH/VIS ppohvis_fdm3d_0.1.0 ppopen AT/STATIC ppohat_0.1.0

20 20 What is new in Ver.0.2.0? Available in SC13 (or can be downloaded) Component New Development ppopen-appl/fdm OpenMP/MPI Hybrid Parallel Programming Model Intel Xeon/Phi Version Interface for ppopen-math/vis-fdm3d ppopen-appl/fvm Optimized Communication ppopen-appl/fem Sample Implementations for Dynamic Solid Mechanics ppopen-math/mp-pp Tool for Generation of Remapping Table in ppopen- MATH/MP ppopen-math/vis Optimized ppopen-math/vis-fdm3d ppopen-at/static Sequence of Statements, Loop Splitting (Optimized) ppopen-appl/fvm ppopen-appl/fdm BEM

21 21 ppopen-math/vis Parallel Visualization using Information of Background Voxels [Nakajima & Chen 2006] FDM version is released: ppopen-math/vis-fdm3d UCD single file Platform T2K,Cray FX10 Flat MPI Unstructured/Hybrid version Next release [Refine] AvailableMemory = 2.0 Available memory size (GB), not available in this version. MaxVoxelCount = 500 Maximum number of voxels MaxRefineLevel = 20 Maximum number of refinement levels

22 SDSC Simplified Parallel Visualization using Background Voxels Octree-based AMR AMR applied to the region where gradient of field values are large stress concentration, shock wave, separation etc. If the number of voxels are controled, a single file with 10 5 meshes is possible, even though entire problem size is 10 9 with distributed data sets.

23 SDSC FEM Mesh (SW Japan Model)

24 SDSC Voxel Mesh (initial)

25 SDSC Voxel Mesh (2-level adapted)

26 SDSC Example of Surface Simplification Initial (11,884 tri s) 50% reduction (5,942 ) 95% reduction (594) 98% reduction (238)

27 pfem-vis 27 pfem3d + ppopen-math/vis Files >$ cd <$O-fem2> >$ cp /home/z30088/fem2/c/fem3dv.tar. >$ tar xvf fem3dv.tar >$ cd <$O-fem2>/fem3dv/src >$ cd../run >$ pjsub go.sh

28 pfem-vis Makefile 28 PPOHVISDIR= /home/z30088/fem2/ppohvis_pfem3dstr_140115/ppohvis/ LIB_DIR = $(PPOHVISDIR)/lib INC_DIR = $(PPOHVISDIR)/include CFLAGSL = -I/home/z30088/ppohVIS_test/include LDFLAGSL = -L/home/z30088/ppohVIS_test/lib LIBSL = -lppohvisfdm3d OPTFLAGS = -Kfast -I$(INC_DIR) LFLAGS = -L$(LIB_DIR) $(LIBS) -lm #.SUFFIXES:.SUFFIXES:.o.c.c.o: $(CC) $(OPTFLAGS) -c $< -o $@ # TARGET =../run/sol default: $(TARGET) OBJS = test1_p.o pfem_init.o... $(TARGET): $(OBJS) $(CC) $(OPTFLAGS) -o $@ $(OBJS) $(LFLAGS) clean: /bin/rm -f *.o *~ $(TARGET)

29 pfem-vis 29 <$O-fem2>/fem3dv/run cube_32x32x32_8pe.0 cube_32x32x32_8pe.1 cube_32x32x32_8pe.2 cube_32x32x32_8pe.3 cube_32x32x32_8pe.4 cube_32x32x32_8pe.5 cube_32x32x32_8pe.6 cube_32x32x32_8pe.7 go.sh INPUT.DAT vis.cnt cube_32x32x32_8pe #!/bin/sh #PJM -L "node=1" #PJM -L "elapse=00:10:00" #PJM -L "rscgrp=lecture" #PJM -g "gt74" #PJM -j #PJM -o "cube8.lst" #PJM --mpi "proc=8" mpiexec./sol

30 pfem-vis 30 pfem3d + ppopen-math/vis INPUT.DAT <HEADER>.* sol with ppohvis vis.cnt Distributed Local Mesh Files vis_disp.1.inp vis_sigma-zz.1.inp ParaView Output

31 pfem-vis 31 #include <stdio.h> #include <stdlib.h> FILE* fp_log; #define GLOBAL_VALUE_DEFINE #include "pfem_util.h" #include "ppohvis_pfem3d_util.h" C/main (1/2) extern void PFEM_INIT(int,char**); extern void INPUT_CNTL(); extern void INPUT_GRID(); extern void MAT_CON0(); extern void MAT_CON1(); extern void MAT_ASS_MAIN(); extern void MAT_ASS_BC(); extern void SOLVE33(); extern void RECOVER_STRESS(); extern void PFEM_FINALIZE(); int main(int argc,char* argv[]) { double START_TIME,END_TIME; struct ppohvis_base_stcontrol *pcontrol=null; struct ppohvis_base_stresultcollection *pnoderesult=null; PFEM_INIT(argc,argv); ppohvis_pfem3d_init(mpi_comm_world); INPUT_CNTL(); pcontrol = ppohvis_pfem3d_getcontrol("vis.cnt"); INPUT_GRID(); if(ppohvis_pfem3d_setmeshex( NP,N,NODE_ID,XYZ, ICELTOT,ICELTOT_INT,ELEM_ID,ICELNOD, NEIBPETOT,NEIBPE,IMPORT_INDEX,IMPORT_ITEM,EXPORT_INDEX,EXPORT_ITEM)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); };

32 pfem-vis 32 C/main (2/2) SOLVE33(); RECOVER_STRESS(); pnoderesult=ppohvis_base_allocateresultcollection(); if(pnoderesult == NULL) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; pnoderesult->results[0]= ppohvis_pfem3d_convresultnodeitem(np,3,"disp",x); pnoderesult->results[1]= ppohvis_pfem3d_convresultnodeitempart(np,3,2, "sgm-zz",sigma_n); if(ppohvis_pfem3d_visualize(pnoderesult,null,pcontrol,"vis",1)) { ppohvis_base_printerror(stderr); MPI_Abort(MPI_COMM_WORLD,errno); }; ppohvis_pfem3d_finalize(); PFEM_FINALIZE() ;

33 pfem-vis vis.cnt 33 [Refine] Section for Refinement Control AvailableMemory = 2.0 (GB)not in use MaxVoxelCount = 1000 Max Voxel # MaxRefineLevel = 20 Max Voxel Refinement Level [Simple] Section for Simplification Control ReductionRate = 0.0 Reduction Rate of Surf. Patches 32,768 elements, 39,949 nodes 554 elements, 828 nodes Deformation Fig. not supported.

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis

3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis 3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen HPC Open Source Infrastructure

More information

3D Parallel FEM (III) Parallel Visualization and ppopen-hpc

3D Parallel FEM (III) Parallel Visualization and ppopen-hpc 3D Parallel FEM (III) Parallel Visualization and ppopen-hpc Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen-hpc 2 3 Framework for Parallel Visualization

More information

3D Parallel FEM (III) Parallel Visualization and ppopen-hpc

3D Parallel FEM (III) Parallel Visualization and ppopen-hpc 3D Parallel FEM (III) Parallel Visualization and ppopen-hpc Kengo Nakajima Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Hybrid Distributed Parallel Computing

More information

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima Information Technology

More information

Extreme Simulations on ppopen-hpc

Extreme Simulations on ppopen-hpc Extreme Simulations on ppopen-hpc Kengo Nakajima Information Technology Center The University of Tokyo Challenges in Extreme Engineering Algorithms ISC 16, June 20, 2016 Frankfurt am Main, Germany Summary

More information

Keywords ppopen-hpc Post-peta-scale systems Automatic tuning Parallel computing

Keywords ppopen-hpc Post-peta-scale systems Automatic tuning Parallel computing ppopen-hpc: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima, Masaki Satoh, Takashi

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction to Parallel Programming for Multicore/Manycore Clusters General Introduction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multi/Many Clusters General duction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri Information Technology

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-5: Communication-Computation Overlapping and Some Other Issues

Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-5: Communication-Computation Overlapping and Some Other Issues Introduction to Parallel Programming for Multicore/Manycore Clusters Part II-5: Communication-Computation Overlapping and Some Other Issues Kengo Nakajima Information Technology Center The University of

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima Information Technology Center The University of Tokyo 2 Motivation for Parallel Computing (and this class) Large-scale

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

Iterative Linear Solvers for Sparse Matrices

Iterative Linear Solvers for Sparse Matrices Iterative Linear Solvers for Sparse Matrices Kengo Nakajima Information Technology Center, The University of Tokyo, Japan 2013 International Summer School on HPC Challenges in Computational Sciences New

More information

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Kengo Nakajima Supercomputing Research Division Information Technology

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa Information Technology Center,

More information

3D Parallel FEM (V) Communication-Computation Overlapping

3D Parallel FEM (V) Communication-Computation Overlapping 3D Parallel FEM (V) Communication-Computation Overlapping Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) 2 z Uniform Distributed Force in z-direction

More information

Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10

Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10 Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX0 Kengo Nakajima Information Technology enter, The University of Tokyo, Japan November 4 th, 0 Fujitsu Booth S Salt Lake

More information

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( )

Report S1 C. Kengo Nakajima Information Technology Center. Technical & Scientific Computing II ( ) Seminar on Computer Science II ( ) Report S1 C Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-3 Report S1 (2/2) Develop parallel program

More information

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic

More information

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs

Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs JH160041-NAHI Hierarchical Low-Rank Approximation Methods on Distributed Memory and GPUs Rio Yokota(Tokyo Institute of Technology) Hierarchical low-rank approximation methods such as H-matrix, H2-matrix,

More information

Overview of Reedbush-U How to Login

Overview of Reedbush-U How to Login Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Communication-Computation. 3D Parallel FEM (V) Overlapping. Kengo Nakajima

Communication-Computation. 3D Parallel FEM (V) Overlapping. Kengo Nakajima 3D Parallel FEM (V) Communication-Computation Overlapping Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) U y =0 @ y=y min (N z -1) elements N z nodes

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede Qingyu Meng, Alan Humphrey, John Schmidt, Martin Berzins Thanks to: TACC Team for early access to Stampede J. Davison

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

High-performance Randomized Matrix Computations for Big Data Analytics and Applications

High-performance Randomized Matrix Computations for Big Data Analytics and Applications jh160025-dahi High-performance Randomized Matrix Computations for Big Data Analytics and Applications akahiro Katagiri (Nagoya University) Abstract In this project, we evaluate performance of randomized

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017 INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and

More information

The ECMWF forecast model, quo vadis?

The ECMWF forecast model, quo vadis? The forecast model, quo vadis? by Nils Wedi European Centre for Medium-Range Weather Forecasts wedi@ecmwf.int contributors: Piotr Smolarkiewicz, Mats Hamrud, George Mozdzynski, Sylvie Malardel, Christian

More information

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks WRF Model NASA Parallel Benchmark Intel MPI Bench My own personal benchmark HPC Challenge Benchmark Abstract

More information

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU

PARALUTION - a Library for Iterative Sparse Methods on CPU and GPU - a Library for Iterative Sparse Methods on CPU and GPU Dimitar Lukarski Division of Scientific Computing Department of Information Technology Uppsala Programming for Multicore Architectures Research Center

More information

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C. The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy! Thomas C. Schulthess ENES HPC Workshop, Hamburg, March 17, 2014 T. Schulthess!1

More information

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( )

Report S1 C. Kengo Nakajima. Programming for Parallel Computing ( ) Seminar on Advanced Computing ( ) Report S1 C Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) Problem S1-1 Report S1 (1/2) Read local files /a1.0~a1.3, /a2.0~a2.3. Develop

More information

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

Simulating tsunami propagation on parallel computers using a hybrid software framework

Simulating tsunami propagation on parallel computers using a hybrid software framework Simulating tsunami propagation on parallel computers using a hybrid software framework Xing Simula Research Laboratory, Norway Department of Informatics, University of Oslo March 12, 2007 Outline Intro

More information

CHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer

CHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical

More information

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration

High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration High-level Abstraction for Block Structured Applications: A lattice Boltzmann Exploration Jianping Meng, Xiao-Jun Gu, David R. Emerson, Gihan Mudalige, István Reguly and Mike B Giles Scientific Computing

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima & Tetsuya Hoshino Information Technology Center The University of Tokyo Motivation for Parallel Computing 2 (and

More information

Fujitsu s Technologies to the K Computer

Fujitsu s Technologies to the K Computer Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The

More information

COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder,

COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, COSMO Dynamical Core Redesign Tobias Gysi David Müller Boulder, 8.9.2011 Supercomputing Systems AG Technopark 1 8005 Zürich 1 Fon +41 43 456 16 00 Fax +41 43 456 16 10 www.scs.ch Boulder, 8.9.2011, by

More information

Report S1 Fortran. Kengo Nakajima Information Technology Center

Report S1 Fortran. Kengo Nakajima Information Technology Center Report S1 Fortran Kengo Nakajima Information Technology Center Technical & Scientific Computing II (4820-1028) Seminar on Computer Science II (4810-1205) Problem S1-1 Report S1 Read local files /a1.0~a1.3,

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multicore/Manycore Clusters duction Kengo Nakajima Information Technology Center The University of Tokyo http://nkl.cc.u-tokyo.ac.jp/18s/ 2 Descriptions of the Class

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Forest-of-octrees AMR: algorithms and interfaces

Forest-of-octrees AMR: algorithms and interfaces Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Generic finite element capabilities for forest-of-octrees AMR

Generic finite element capabilities for forest-of-octrees AMR Generic finite element capabilities for forest-of-octrees AMR Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

The Icosahedral Nonhydrostatic (ICON) Model

The Icosahedral Nonhydrostatic (ICON) Model The Icosahedral Nonhydrostatic (ICON) Model Scalability on Massively Parallel Computer Architectures Florian Prill, DWD + the ICON team 15th ECMWF Workshop on HPC in Meteorology October 2, 2012 ICON =

More information

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg.

Parallel I/O in the LFRic Infrastructure. Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids th September 2017, DKRZ, Hamburg. Parallel I/O in the LFRic Infrastructure Samantha V. Adams Workshop on Exascale I/O for Unstructured Grids 25-26 th September 2017, DKRZ, Hamburg. Talk Overview Background and Motivation for the LFRic

More information

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla

Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla Performance Optimization of a Massively Parallel Phase-Field Method Using the HPC Framework walberla SIAM PP 2016, April 13 th 2016 Martin Bauer, Florian Schornbaum, Christian Godenschwager, Johannes Hötzer,

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Linkage of XcalableMP and Python languages for high productivity on HPC cluster system

Linkage of XcalableMP and Python languages for high productivity on HPC cluster system Linkage of XcalableMP and Python languages for high productivity on HPC cluster system Masahiro Nakao (RIKEN Center for Computational Science) 6th XMP Workshop@University of Tsukuba, Japan Background XcalableMP

More information

Optimization of finite-difference kernels on multi-core architectures for seismic applications

Optimization of finite-difference kernels on multi-core architectures for seismic applications Optimization of finite-difference kernels on multi-core architectures for seismic applications V. Etienne 1, T. Tonellot 1, K. Akbudak 2, H. Ltaief 2, S. Kortas 3, T. Malas 4, P. Thierry 4, D. Keyes 2

More information

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB

More information

High Performance Computing for PDE Towards Petascale Computing

High Performance Computing for PDE Towards Petascale Computing High Performance Computing for PDE Towards Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik, Univ. Dortmund

More information

ICOMEX: Icosahedral grid Models for Exascale Earth System Simulations

ICOMEX: Icosahedral grid Models for Exascale Earth System Simulations ICOMEX: Icosahedral grid Models for Exascale Earth System Simulations Early results of the G8 Exascale Projects Thomas Ludwig University of Hamburg/DKRZ, Günther Zängl Deutscher Wetterdienst, Masaki Satoh

More information

Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation

Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation http://omni compiler.org/ Omni Compiler and XcodeML: An Infrastructure for Source-to- Source Transformation MS03 Code Generation Techniques for HPC Earth Science Applications Mitsuhisa Sato (RIKEN / Advanced

More information

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500

More information

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST

Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Integrating GPUs as fast co-processors into the existing parallel FE package FEAST Dipl.-Inform. Dominik Göddeke (dominik.goeddeke@math.uni-dortmund.de) Mathematics III: Applied Mathematics and Numerics

More information

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers Overlapping Computation and Communication for Advection on Hybrid Parallel Computers James B White III (Trey) trey@ucar.edu National Center for Atmospheric Research Jack Dongarra dongarra@eecs.utk.edu

More information

Fourteen years of Cactus Community

Fourteen years of Cactus Community Fourteen years of Cactus Community Frank Löffler Center for Computation and Technology Louisiana State University, Baton Rouge, LA September 6th 2012 Outline Motivation scenario from Astrophysics Cactus

More information

HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017

HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017 HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016

More information

Algorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing

Algorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing Algorithm and Library Software Design Challenges for Tera, Peta, and Future Exascale Computing Bo Kågström Department of Computing Science and High Performance Computing Center North (HPC2N) Umeå University,

More information

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs

Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Efficient Finite Element Geometric Multigrid Solvers for Unstructured Grids on GPUs Markus Geveler, Dirk Ribbrock, Dominik Göddeke, Peter Zajac, Stefan Turek Institut für Angewandte Mathematik TU Dortmund,

More information

Advanced Computation and I/O Methods for Earth-System Simulations Status update

Advanced Computation and I/O Methods for Earth-System Simulations Status update Advanced Computation and I/O Methods for Earth-System Simulations Status update Nabeeh Jum ah, Anastasiia Novikova, Julian M. Kunkel, Thomas Ludwig, Thomas Dubos, Naoya Maruyama, Takayuki Aoki, Günther

More information

A Multiscale Non-hydrostatic Atmospheric Model for Regional and Global Applications

A Multiscale Non-hydrostatic Atmospheric Model for Regional and Global Applications A Multiscale Non-hydrostatic Atmospheric Model for Regional and Global Applications James D. Doyle 1, Frank Giraldo 2, Saša Gaberšek 1 1 Naval Research Laboratory, Monterey, CA, USA 2 Naval Postgraduate

More information

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction

FOR P3: A monolithic multigrid FEM solver for fluid structure interaction FOR 493 - P3: A monolithic multigrid FEM solver for fluid structure interaction Stefan Turek 1 Jaroslav Hron 1,2 Hilmar Wobker 1 Mudassar Razzaq 1 1 Institute of Applied Mathematics, TU Dortmund, Germany

More information

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer

OpenACC. Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer OpenACC Introduction and Evolutions Sebastien Deldon, GPU Compiler engineer 3 WAYS TO ACCELERATE APPLICATIONS Applications Libraries Compiler Directives Programming Languages Easy to use Most Performance

More information

Performance and accuracy of hardware-oriented. native-, solvers in FEM simulations

Performance and accuracy of hardware-oriented. native-, solvers in FEM simulations Robert Strzodka, Stanford University Dominik Göddeke, Universität Dortmund Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations Number of slices

More information

Workshop Agenda Feb 25 th 2015

Workshop Agenda Feb 25 th 2015 Workshop Agenda Feb 25 th 2015 Time Presenter Title 09:30 T. König Talk bwhpc Concept & bwhpc-c5 - Federated User Support Activities 09:45 R. Walter Talk bwhpc architecture (bwunicluster, bwforcluster

More information

High Performance Computing for PDE Some numerical aspects of Petascale Computing

High Performance Computing for PDE Some numerical aspects of Petascale Computing High Performance Computing for PDE Some numerical aspects of Petascale Computing S. Turek, D. Göddeke with support by: Chr. Becker, S. Buijssen, M. Grajewski, H. Wobker Institut für Angewandte Mathematik,

More information

What is Cactus? Cactus is a framework for developing portable, modular applications

What is Cactus? Cactus is a framework for developing portable, modular applications What is Cactus? Cactus is a framework for developing portable, modular applications What is Cactus? Cactus is a framework for developing portable, modular applications focusing, although not exclusively,

More information

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012 Outline NICS and AACE Architecture Overview Resources Native Mode Boltzmann BGK Solver Native/Offload

More information

Accelerated Earthquake Simulations

Accelerated Earthquake Simulations Accelerated Earthquake Simulations Alex Breuer Technische Universität München Germany 1 Acknowledgements Volkswagen Stiftung Project ASCETE: Advanced Simulation of Coupled Earthquake-Tsunami Events Bavarian

More information

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC

Algorithms, System and Data Centre Optimisation for Energy Efficient HPC 2015-09-14 Algorithms, System and Data Centre Optimisation for Energy Efficient HPC Vincent Heuveline URZ Computing Centre of Heidelberg University EMCL Engineering Mathematics and Computing Lab 1 Energy

More information

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances) HPC and IT Issues Session Agenda Deployment of Simulation (Trends and Issues Impacting IT) Discussion Mapping HPC to Performance (Scaling, Technology Advances) Discussion Optimizing IT for Remote Access

More information

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011

ANSYS HPC. Technology Leadership. Barbara Hutchings ANSYS, Inc. September 20, 2011 ANSYS HPC Technology Leadership Barbara Hutchings barbara.hutchings@ansys.com 1 ANSYS, Inc. September 20, Why ANSYS Users Need HPC Insight you can t get any other way HPC enables high-fidelity Include

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing

More information

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

Early Experiences Writing Performance Portable OpenMP 4 Codes

Early Experiences Writing Performance Portable OpenMP 4 Codes Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic

More information

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors

Yasuo Okabe. Hitoshi Murai. 1. Introduction. 2. Evaluation. Elapsed Time (sec) Number of Processors Performance Evaluation of Large-scale Parallel Simulation Codes and Designing New Language Features on the (High Performance Fortran) Data-Parallel Programming Environment Project Representative Yasuo

More information

Lecture 2 Unstructured Mesh Generation

Lecture 2 Unstructured Mesh Generation Lecture 2 Unstructured Mesh Generation MIT 16.930 Advanced Topics in Numerical Methods for Partial Differential Equations Per-Olof Persson (persson@mit.edu) February 13, 2006 1 Mesh Generation Given a

More information

Application Example Running on Top of GPI-Space Integrating D/C

Application Example Running on Top of GPI-Space Integrating D/C Application Example Running on Top of GPI-Space Integrating D/C Tiberiu Rotaru Fraunhofer ITWM This project is funded from the European Union s Horizon 2020 Research and Innovation programme under Grant

More information

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn-

AACE: Applications. Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- AACE: Applications R. Glenn Brook Director, Application Acceleration Center of Excellence National Institute for Computational Sciences glenn- brook@tennessee.edu Ryan C. Hulguin Computational Science

More information

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation Michael Lange 1 Gerard Gorman 1 Michele Weiland 2 Lawrence Mitchell 2 Xiaohu Guo 3 James Southern 4 1 AMCG, Imperial College

More information

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015 PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models

Solving Large Complex Problems. Efficient and Smart Solutions for Large Models Solving Large Complex Problems Efficient and Smart Solutions for Large Models 1 ANSYS Structural Mechanics Solutions offers several techniques 2 Current trends in simulation show an increased need for

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

OpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2

OpenACC2 vs.openmp4. James Lin 1,2 and Satoshi Matsuoka 2 2014@San Jose Shanghai Jiao Tong University Tokyo Institute of Technology OpenACC2 vs.openmp4 he Strong, the Weak, and the Missing to Develop Performance Portable Applica>ons on GPU and Xeon Phi James

More information

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM

Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Efficient Multi-GPU CUDA Linear Solvers for OpenFOAM Alexander Monakov, amonakov@ispras.ru Institute for System Programming of Russian Academy of Sciences March 20, 2013 1 / 17 Problem Statement In OpenFOAM,

More information

Framework for Separation of Concerns Between Application Requirements and System Requirements

Framework for Separation of Concerns Between Application Requirements and System Requirements 1 Framework for Separation of Concerns Between Application Requirements and System Requirements SIAM Conference on Computational Science & Engineering 2015 Mar 18, 2015@Salt Lake City Convention Center

More information

Introduction to OpenMP

Introduction to OpenMP 1 / 7 Introduction to OpenMP: Exercises and Handout Introduction to OpenMP Christian Terboven Center for Computing and Communication, RWTH Aachen University Seffenter Weg 23, 52074 Aachen, Germany Abstract

More information

gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques

gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques gpot: Intelligent Compiler for GPGPU using Combinatorial Optimization Techniques Yuta TOMATSU, Tomoyuki HIROYASU, Masato YOSHIMI, Mitsunori MIKI Graduate Student of School of Ewngineering, Faculty of Department

More information

VisIt Libsim. An in-situ visualisation library

VisIt Libsim. An in-situ visualisation library VisIt Libsim. An in-situ visualisation library December 2017 Jean M. Favre, CSCS Outline Motivations In-situ visualization In-situ processing strategies VisIt s libsim library Enable visualization in a

More information

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators

On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC

More information

Experiences with ENZO on the Intel Many Integrated Core Architecture

Experiences with ENZO on the Intel Many Integrated Core Architecture Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and

More information