Introduction to Parallel Programming for Multicore/Manycore Clusters
|
|
- Cornelia Webb
- 5 years ago
- Views:
Transcription
1 duction to Parallel Programming for Multicore/Manycore Clusters duction Kengo Nakajima Information Technology Center The University of Tokyo
2 2 Descriptions of the Class Technical & Scientific Computing I ( ) 科学技術計算 Ⅰ Department of Mathematical Informatics Special Lecture on Computational Science I( ) 計算科学アライアンス特別講義 Ⅰ Department of Computer Science Multithreaded Parallel Computing ( ) (from 2016) スレッド並列コンピューティング Department of Electrical Engineering & Information Systems This class is certificated as the category F" lecture of "the Computational Alliance, the University of Tokyo"
3 duction to FEM Programming FEM: Finite-Element Method: 有限要素法 Summer (I) : FEM Programming for Solid Mechanics Winter (II): Parallel FEM using MPI The 1 st part (summer) is essential for the 2 nd part (winter) Problems Many new (international) students in Winter, who did not take the 1 st part in Summer They are generally more diligent than Japanese students 2015 Summer (I) : Multicore programming by OpenMP Winter (II): FEM + Parallel FEM by MPI/OpenMP for Heat Transfer Part I & II are independent (maybe...) 2017 Lectures are given in English Reedbush-U Supercomputer System
4 Motivation for Parallel Computing 4 (and this class) Large-scale parallel computer enables fast computing in large-scale scientific simulations with detailed models. Computational science develops new frontiers of science and engineering. Why parallel computing? faster & larger larger is more important from the view point of new frontiers of science & engineering, but faster is also important. + more complicated Ideal: Scalable Solving N x scale problem using N x computational resources during same computation time (weak scaling) Solving a fix-sized problem using N x computational resources in 1/N computation time (strong scaling)
5 5 Scientific Computing = SMASH Science Modeling Algorithm Software Hardware You have to learn many things. Collaboration (or Co-Design) will be important for future career of each of you, as a scientist and/or an engineer. You have to communicate with people with different backgrounds. It is more difficult than communicating with foreign scientists from same area. (Q): Computer Science, Computational Science, or Numerical Algorithms?
6 6 This Class... Science Modeling Algorithm Software Hardware Target: Parallel FVM (Finite- Volume Method) using OpenMP Science: 3D Poisson Equations Modeling: FVM Algorithm: Iterative Solvers etc. You have to know many components to learn FVM, although you have already learned each of these in undergraduate and high-school classes.
7 7 Road to Programming for Parallel Scientific Computing Programming for Parallel Scientific Computing (e.g. Parallel FEM/FDM) Programming for Real World Scientific Computing (e.g. FEM, FDM) Programming for Fundamental Numerical Analysis (e.g. Gauss-Seidel, RK etc.) Big gap here!! Unix, Fortan, C etc.
8 8 The third step is important! How to parallelize applications? How to extract parallelism? If you understand methods, algorithms, and implementations of the original code, it s easy. Data-structure is important 4. Programming for Parallel Scientific Computing (e.g. Parallel FEM/FDM) 3. Programming for Real World Scientific Computing (e.g. FEM, FDM) 2. Programming for Fundamental Numerical Analysis (e.g. Gauss-Seidel, RK etc.) 1. Unix, Fortan, C etc. How to understand the code? Reading the application code!! It seems primitive, but very effective. In this class, reading the source code is encouraged. 3: FVM, 4: Parallel FVM
9 Kengo Nakajima 中島研吾 (1/2) Current Position Professor, Supercomputing Research Division, Information Technology Center, The University of Tokyo( 情報基盤センター ) Department of Mathematical Informatics, Graduate School of Information Science & Engineering, The University of Tokyo( 情報理工 数理情報学 ) Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo( 工 電気系工学 ) Deputy Director, RIKEN CCS (Center for Computational Science) (Kobe) (20%) Research Interest High-Performance Computing Parallel Numerical Linear Algebra (Preconditioning) Parallel Programming Model Computational Mechanics, Computational Fluid Dynamics Adaptive Mesh Refinement, Parallel Visualization 9
10 Kengo Nakajima (2/2) Education B.Eng (Aeronautics, The University of Tokyo, 1985) M.S. (Aerospace Engineering, University of Texas, 1993) Ph.D. (Quantum Engineering & System Sciences, The University of Tokyo, 2003) Professional Mitsubishi Research Institute, Inc. ( ) Research Organization for Information Science & Technology ( ) The University of Tokyo Department Earth & Planetary Science ( ) Information Technology Center (2008-) JAMSTEC ( ), part-time RIKEN ( ), part-time 10
11 11 Supercomputers and Computational Science Overview of the Class Future Issues
12 12 Computer & Central Processing Unit ( 中央処理装置 ): s used in PC and Supercomputers are based on same architecture GHz: Clock Rate Frequency: Number of operations by per second GHz -> 10 9 operations/sec Simultaneous 4-8 instructions per clock
13 コア () Single 1 cores/ Multicore Dual 2 cores/ Quad 4 cores/ = Central part of Multicore s with 4-8 cores are popular Low Power 13 GPU: Manycore O(10 1 )-O(10 2 ) cores More and more cores Parallel computing Reedbush-U: 18 cores x 2 Intel Xeon Broadwell-EP
14 14 GPU/Manycores GPU:Graphic Processing Unit GPGPU: General Purpose GPU O(10 2 ) cores High Memory Bandwidth Cheap NO stand-alone operations Host needed Programming: CUDA, OpenACC Intel Xeon/Phi: Manycore 60+ cores High Memory Bandwidth Unix, Fortran, C compiler Host needed in the 1 st generation Stand-alone is possible now (Knights Landing, KNL)
15 15 Parallel Supercomputers Multicore s are connected through network
16 16 Supercomputers with Heterogeneous/Hybrid Nodes GPU Manycore GPU Manycore GPU Manycore GPU Manycore GPU Manycore
17 Performance of Supercomputers Performance of : Clock Rate FLOPS (Floating Point Operations per Second) Real Number Recent Multicore 4-8 FLOPS per Clock (e.g.) Peak performance of a core with 3GHz (or 8)=12(or 24) 10 9 FLOPS=12(or 24)GFLOPS 10 6 FLOPS= 1 Mega FLOPS = 1 MFLOPS 10 9 FLOPS= 1 Giga FLOPS = 1 GFLOPS FLOPS= 1 Tera FLOPS = 1 TFLOPS FLOPS= 1 Peta FLOPS = 1 PFLOPS FLOPS= 1 Exa FLOPS = 1 EFLOPS 17
18 Peak Performance of Reedbush-U Intel Xeon E v4 (Broadwell-EP) 2.1 GHz 16 DP (Double Precision) FLOP operations per Clock Peak Performance (1 core) = 33.6 GFLOPS Peak Performance 1-Socket, 18 cores: GFLOPS 2-Sockets, 36 cores: 1,209.6 GFLOPS 1-Node 18
19 TOP 500 List Ranking list of supercomputers in the world Performance (FLOPS rate) is measured by Linpack which solves large-scale linear equations. Since 1993 Updated twice a year (International Conferences in June and November) Linpack iphone version is available 19
20 PFLOPS: Peta (=10 15 ) Floating OPerations per Sec. Exa-FLOPS (=10 18 ) will be attained after
21 Benchmarks TOP 500 (Linpack,HPL(High Performance Linpack)) Direct Linear Solvers, FLOPS rate Regular Dense Matrices, Continuous Memory Access Computing Performance HPCG Preconditioned Iterative Solvers, FLOPS rate Irregular Sparse Matrices derived from FEM Applications with Many 0 Components Irregular/Random Memory Access, Closer to Real Applications than HPL Performance of Memory, Communications Green 500 FLOPS/W rate for HPL (TOP500) 21
22 50 th TOP500 List (November, 2017) Site Computer/Year Vendor s R max (TFLOPS) R peak (TFLOPS) Power (kw) National Supercomputing Center in Wuxi, China National Supercomputing Center in Tianjin, China Swiss Natl. Supercomputer Center, Switzerland 4 JAMSTEC, JAPAN 5 6 Oak Ridge National Laboratory, USA Lawrence Livermore National Laboratory, USA 7 DOE/NNSA/LANL/SNL 8 DOE/SC/LBNL/NERSC USA 9 Joint Center for Advanced High Performance Computing, Japan 10 RIKEN AICS, Japan Sunway TaihuLight, Sunway MPP, Sunway SW C 1.45GHz, 2016 NRCPC Tianhe-2, Intel Xeon E5-2692, TH Express-2, Xeon Phi, 2013 NUDT Piz Daint Cray XC30/NVIDIA P100, 2013 Cray Gyoukou, ZettaScaler-2.2 HPC, Xeon D-1571, 2017, ExaScaler Titan Cray XK7/NVIDIA K20x, 2012 Cray Sequoia BlueGene/Q, 2011 IBM Trinity, Cray XC40 Intel Xeon Phi C 1.4GHz, Cray Aries, 2017, Cray Cori, Cray XC40, Intel Xeon Phi C 1.4GHz, Cray Aries, 2016 Cray Oakforest-PACS, PRIMERGY CX600 M1, Intel Xeon Phi Processor C 1.4GHz, Intel Omni-Path, 2016 Fujitsu K computer, SPARC64 VIIIfx, 2011 Fujitsu 10,649,600 3,120,000 93,015 (= 93.0 PF) 33,863 (= 33.9 PF) 125,436 15,371 54,902 17, ,760 19,590 33,863 2,272 19,860 19,136 28,192 1, ,640 17,590 27,113 8,209 1,572,864 17,173 20,133 7, ,968 14,137 43, ,400 14,015 27,881 3, ,056 13,555 24,914 2, ,024 10,510 11,280 12,660 R max : Performance of Linpack (TFLOPS) R peak : Peak Performance (TFLOPS), Power: kw 22
23 HPCG Ranking (November, 2017) Computer s HPL Rmax TOP500 HPCG (Pflop/s) Rank (Pflop/s) Peak 1 K computer 705, % 2 Tianhe-2 (MilkyWay-2) 3,120, % 3 Trinity 979, % 4 Piz Daint 361, % 5 Sunway TaihuLight 10,649, % 6 Oakforest-PACS 557, % 7 Cori 632, % 8 Sequoia 1,572, % 9 Titan 560, % 10 Tsbubame 3 136, % 23
24 1 NVIDIA Corporation 2 Green 500 Ranking (November, 2016) Site Computer Swiss National Supercomputing Centre (CSCS) DGX SATURNV Piz Daint NVIDIA DGX-1, Xeon E5-2698v4 20C 2.2GHz, Infiniband EDR, NVIDIA Tesla P100 Cray XC50, Xeon E5-2690v3 12C 2.6GHz, Aries interconnect, NVIDIA Tesla P100 HPL Rmax (Pflop/s) TOP500 Rank Power (MW) GFLOPS/W JCAHPC 7 DOE/SC/Argonne National Lab. 8 3 RIKEN ACCS Shoubu ZettaScaler-1.6 etc National SC Sunway Sunway MPP, Sunway SW Center in Wuxi TaihuLight 260C 1.45GHz, Sunway SFB/TR55 at Fujitsu Tech. Solutions GmbH QPACE3 PRIMERGY CX1640 M1, Intel Xeon Phi C 1.3GHz, Intel Omni- Path Oakforest- PACS Theta Stanford Research Computing Center XStream ACCMS, Kyoto 9 University Jefferson Natl. 10 Accel. Facility Camphor 2 SciPhi XVI PRIMERGY CX1640 M1, Intel Xeon Phi C 1.4GHz, Intel Omni- Path Cray XC40, Intel Xeon Phi C 1.3GHz, Aries interconnect Cray CS-Storm, Intel Xeon E5-2680v2 10C 2.8GHz, Infiniband FDR, Nvidia K80 Cray XC40, Intel Xeon Phi C 1.4GHz, Aries interconnect KOI Cluster, Intel Xeon Phi C 1.3GHz, Intel Omni-Path
25 Green 500 Ranking (June, 2017) Site Computer 1 Tokyo Tech. TSUBAME3.0 2 Yahoo Japan kukai 3 AIST, Japan CAIP, RIKEN, JAPAN AIST AI Cloud RAIDEN GPU subsystem - SGI ICE XA, IP139-SXM2, Xeon E5-2680v4, NVIDIA Tesla P100 SXM2, HPE ZettaScaler-1.6, Xeon E5-2650Lv4,, NVIDIA Tesla P100, Exascalar NEC 4U-8GPU Server, Xeon E5-2630Lv4, NVIDIA Tesla P100 SXM2, NEC NVIDIA DGX-1, Xeon E5-2698v4, NVIDIA Tesla P100, Fujitsu Univ. Cambridge, UK Wilkes-2 - Dell C4130, Xeon E5-2650v4, NVIDIA Tesla P100, Dell Swiss Natl. SC. Center (CSCS) JAMSTEC, Japan Inst. for Env. Studies, Japan Piz Daint Gyoukou, GOSAT-2 (RCF2) 9 Facebook, USA Penguin Relion 10 NVIDIA, USA 11 ITC, U.Tokyo, Japan DGX Saturn V Reedbush-H Cray XC50, Xeon E5-2690v3, NVIDIA Tesla P100, Cray Inc. ZettaScaler-2.0 HPC system, Xeon D- 1571, PEZY-SC2, ExaScalar SGI Rackable C1104-GP1, Xeon E5-2650v4, NVIDIA Tesla P100, NSSOL/HPE Xeon E5-2698v4/E5-2650v4, NVIDIA Tesla P100, Acer Group Xeon E5-2698v4, NVIDIA Tesla P100, Nvidia SGI Rackable C1102-GP8, Xeon E5-2695v4, NVIDIA Tesla P100 SXM2, HPE HPL Rmax (Pflop/s) TOP500 Rank Power (MW) GFLOPS/ W 1, , , , , , ,
26 1 RIKEN, Japan Green 500 Ranking (Nov., 2017) Site Computer Shoubu system B 2 KEK, Japan Suiren2 3 PEZY, Japan Sakura 4 NVIDIA, USA 5 JAMSTEC, Japan DGX Saturn V Volta Gyoukou 6 Tokyo Tech. TSUBAME3.0 7 AIST, Japan AIST AI Cloud ZettaScaler-2.2 HPC system, Xeon D- 1571, PEZY-SC2, ExaScalar ZettaScaler-2.2 HPC system, Xeon D- 1571, PEZY-SC2, ExaScalar ZettaScaler-2.2 HPC system, Xeon E5-2618Lv3, PEZY-SC2, ExaScalar Xeon E5-2698v4, NVIDIA Tesla V100, Nvidia ZettaScaler-2.2 HPC system, Xeon D- 1571, PEZY-SC2, ExaScalar SGI ICE XA, IP139-SXM2, Xeon E5-2680v4, NVIDIA Tesla P100 SXM2, HPE NEC 4U-8GPU Server, Xeon E5-2630Lv4, NVIDIA Tesla P100 SXM2, NEC HPL Rmax (Pflop/s) TOP500 Rank Power (kw) GFLOPS/ W , , , , CAIP, RIKEN, JAPAN RAIDEN GPU subsystem - NVIDIA DGX-1, Xeon E5-2698v4, NVIDIA Tesla P100, Fujitsu Univ. Cambridge, UK Wilkes-2 - Dell C4130, Xeon E5-2650v4, NVIDIA Tesla P100, Dell Swiss Natl. SC. Center (CSCS) ITC, U.Tokyo, Japan Piz Daint Reedbush-L Cray XC50, Xeon E5-2690v3, NVIDIA Tesla P100, Cray Inc. SGI Rackable C1102-GP8, Xeon E5-2695v4, NVIDIA Tesla P100 SXM2, HPE 1, , ,
27 Computational Science The 3 rd Pillar of Science Theoretical & Experimental Science Computational Science The 3 rd Pillar of Science Simulations using Supercomputers Theoretical Computational Experimental 27
28 Methods for Scientific Computing Numerical solutions of PDE (Partial Diff. Equations) Grids, Meshes, Particles Large-Scale Linear Equations Finer meshes provide more accurate solutions 28
29 3D Simulations for Earthquake Generation Cycle San Andreas Faults, CA, USA Stress Accumulation at Transcurrent Plate Boundaries 29
30 Adaptive FEM: High-resolution needed at meshes with large deformation (large accumulation) 30
31 31 Typhoon Simulations by FDM Effect of Resolution h=100km h=50km h=5km [JAMSTEC]
32 32 Simulation of Typhoon MANGKHUT in 2003 using the Earth Simulator [JAMSTEC]
33 33 Simulation of Geologic CO 2 Storage [Dr. Hajime Yamamoto, Taisei]
34 34 Simulation of Geologic CO 2 Storage International/Interdisciplinary Collaborations Taisei (Science, Modeling) Lawrence Berkeley National Laboratory, USA (Modeling) Information Technology Center, the University of Tokyo (Algorithm, Software) JAMSTEC (Earth Simulator Center) (Software, Hardware) NEC (Software, Hardware) 2010 Japan Geotechnical Society (JGS) Award
35 35 Simulation of Geologic CO 2 Storage Science Behavior of CO 2 in supercritical state at deep reservoir PDE s 3D Multiphase Flow (Liquid/Gas) + 3D Mass Transfer Method for Computation TOUGH2 code based on FVM, and developed by Lawrence Berkeley National Laboratory, USA More than 90% of computation time is spent for solving large-scale linear equations with more than 10 7 unknowns Numerical Algorithm Fast algorithm for large-scale linear equations developed by Information Technology Center, the University of Tokyo Supercomputer Earth Simulator II (NEX SX9, JAMSTEC, 130 TFLOPS) Oakleaf-FX (Fujitsu PRIMEHP FX10, U.Tokyo, 1.13 PFLOPS
36 Diffusion-Dissolution-Convection Process Injection Well Caprock(Low permeable seal) Supercritical CO 2 Convective Mixing Buoyant scco 2 overrides onto groundwater Dissolutionof CO 2 increases water density Denser fluid laid on lighter fluid Rayleigh-Taylor instability invokes convective mixing of groundwater The mixing significantly enhances the CO 2 dissolution into groundwater, resulting in more stable storage Preliminary 2D simulation (Yamamoto et al., GHGT11) [Dr. Hajime Yamamoto, Taisei] 36 COPYRIGHT TAISEI CORPORATION ALL RIGHTS RESERVED
37 Density convections for 1,000 years: Flow Model Only the far side of the vertical cross section passing through the injection well is depicted. [Dr. Hajime Yamamoto, Taisei] The meter-scale fingers gradually developed to larger ones in the field-scale model Huge number of time steps (> 10 5 ) were required to complete the 1,000-yrs simulation Onset time (10-20 yrs) is comparable to theoretical (linear stability analysis, 15.5yrs) COPYRIGHT TAISEI CORPORATION ALL RIGHTS RESERVED 37
38 Calculation Time (sec) Simulation of Geologic CO 2 Storage 30 million DoF(10 million grids 3 DoF/grid node) Average time for solving matrix for one time step Number of Processors TOUGH2-MP on FX times speedup [Dr. Hajime Yamamoto, Taisei] Fujitsu FX10(Oakleaf-FX),30M DOF: 2x-3x improvement 3D Multiphase Flow (Liquid/Gas) + 3D Mass Transfer COPYRIGHT TAISEI CORPORATION ALL RIGHTS RESERVED 38
39 Motivation for Parallel Computing 39 again Large-scale parallel computer enables fast computing in large-scale scientific simulations with detailed models. Computational science develops new frontiers of science and engineering. Why parallel computing? faster & larger larger is more important from the view point of new frontiers of science & engineering, but faster is also important. + more complicated Ideal: Scalable Solving N x scale problem using N x computational resources during same computation time (weak scaling) Solving a fix-sized problem using N x computational resources in 1/N computation time (strong scaling)
40 40 Supercomputers and Computational Science Overview of the Class Future Issues
41 41 Our Current Target: Multicore Cluster Multicore s are connected through network Memory Memory Memory Memory Memory OpenMP Multithreading Intra Node (Intra ) Shared Memory MPI Message Passing Inter Node (Inter ) Distributed Memory
42 42 Our Current Target: Multicore Cluster Multicore s are connected through network Memory Memory Memory Memory Memory OpenMP Multithreading Intra Node (Intra ) Shared Memory MPI Message Passing Inter Node (Inter ) Distributed Memory
43 43 Our Current Target: Multicore Cluster Multicore s are connected through network Memory Memory Memory Memory Memory OpenMP Multithreading Intra Node (Intra ) Shared Memory MPI (after October) Message Passing Inter Node (Inter ) Distributed Memory
44 44 Flat MPI vs. Hybrid Flat-MPI:Each -> Independent MPI only Intra/Inter Node memory core core core memory core core core memory core core core core core core Hybrid:Hierarchal Structure OpenMP MPI memory core core core core memory core core core core memory core core core core
45 45 Example of OpnMP/MPI Hybrid Sending Messages to Neighboring Processes MPI: Message Passing, OpenMP: Threading with Directives!C!C SEND do neib= 1, NEIBPETOT II= (LEVEL-1)*NEIBPETOT istart= STACK_EXPORT(II+neib-1) inum = STACK_EXPORT(II+neib ) - istart!$omp parallel do do k= istart+1, istart+inum WS(k-NE0)= X(NOD_EXPORT(k)) enddo call MPI_Isend (WS(istart+1-NE0), inum, MPI_DOUBLE_PRECISION, & & NEIBPE(neib), 0, MPI_COMM_WORLD, & & req1(neib), ierr) enddo
46 In order to make full use of modern supercomputer systems with multicore/manycore architectures, hybrid parallel programming with message-passing and multithreading is essential. MPI for message passing and OpenMP for multithreading are the most popular ways for parallel programming on multicore/manycore clusters. In this class, we parallelize a finite-volume method code with Krylov iterative solvers for Poisson s equation on Reedbush-U Supercomputer with Intel Broadwell-EP at the University of Tokyo. Because of limitation of time, we are (mainly) focusing on multithreading by OpenMP. ICCG solver (Conjugate Gradient iterative solvers with Incomplete Cholesky preconditioning) is a widely-used method for solving linear equations. Because it includes data dependency where writing/reading data to/from memory could occur simultaneously, parallelization using OpenMP is not straight forward. 46
47 We need certain kind of reordering in order to extract parallelism. 47 Lectures and exercise on the following issues related to OpenMP will be conducted: Finite-Volume Method (FVM) Krylov Iterative Method Preconditioning Implementation of the Program duction to OpenMP Reordering/Coloring Method Parallel FVM Code using OpenMP
48 Date ID Title Apr-09 (M) CS-01 duction Apr-16 (M) CS-02 FVM (1/3) Apr-23 (M) CS-03 FVM (2/3) Apr-30 (M) (no class) (National Holiday) May-07 (M) CS-04 FVM (3/3) May-14 (M) CS-05 Login to Reedbush-U, OpenMP (1/3) May-21 (M) CS-06 OpenMP (2/3) May-28 (M) CS-07 OpenMP (3/3) Jun-04 (M) CS-08 Reordering (1/2) Jun-11 (M) CS-09 Reordering (2/2) Jun-18 (M) CS-10 Tuning Jun-25 (M) (canceled) Jul-02 (M) CS-11 Parallel Code by OpenMP (1/2) Jul-09 (M) CS-12 Parallel Code by OpenMP (2/2) Jul-23 (M) CS-13 Exercises
49 49 Prerequisites Fundamental physics and mathematics Linear algebra, analytics Experiences in fundamental numerical algorithms LU factorization/decomposition, Gauss-Seidel Experiences in programming by C or Fortran Experiences and knowledge in UNIX User account of ECCS2016 must be obtained:
50 50 Strategy If you can develop programs by yourself, it is ideal... but difficult. focused on reading, not developing by yourself Programs are in C and Fortran Lectures are done by... Lecture Materials available at NOON Friday through WEB. NO hardcopy is provided Starting at 08:30 You can enter the building after 08:00 Taking seats from the front row. Terminals must be shut-down after class.
51 Grades 1 or 2 Reports on programming
52 52 If you have any questions, please feel free to contact me! Office: 3F Annex/Information Technology Center #36 情報基盤センター別館 3F 36 号室 ext.: nakajima(at)cc.u-tokyo.ac.jp NO specific office hours, appointment by 日本語資料 ( 一部 )
53 53 Keywords for OpenMP OpenMP Directive based, (seems to be) easy Many books Data Dependency Conflict of reading from/writing to memory Appropriate reordering of data is needed for consistent parallel computing NO detailed information in OpenMP books: very complicated
54 54 Some Technical Terms Processor, Processing Unit (H/W), Processor= for single-core proc s Process Unit for MPI computation, nearly equal to core Each core (or processor) can host multiple processes (but not efficient) PE (Processing Element) PE originally mean processor, but it is sometimes used as process in this class. Moreover it means domain (next) In multicore proc s: PE generally means core Domain domain=process (=PE), each of MD in SPMD, each data set Process ID of MPI (ID of PE, ID of domain) starts from 0 if you have 8 processes (PE s, domains), ID is 0~7
55 55 Supercomputers and Computational Science Overview of the Class Future Issues
56 56 Key-Issues towards Appl./Algorithms on Exa-Scale Systems Jack Dongarra (ORNL/U. Tennessee) at ISC 2013 Hybrid/Heterogeneous Architecture Multicore + GPU/Manycores (Intel MIC/Xeon Phi) Data Movement, Hierarchy of Memory Communication/Synchronization Reducing Algorithms Mixed Precision Computation Auto-Tuning/Self-Adapting Fault Resilient Algorithms Reproducibility of Results
57 57 Supercomputers with Heterogeneous/Hybrid Nodes GPU Manycore GPU Manycore GPU Manycore GPU Manycore GPU Manycore
58 58 Hybrid Parallel Programming Model is essential for Post-Peta/Exascale Computing Message Passing (e.g. MPI) + Multi Threading (e.g. OpenMP, CUDA, OpenCL, OpenACC etc.) In K computer and FX10, hybrid parallel programming is recommended MPI + Automatic Parallelization by Fujitsu s Compiler Expectations for Hybrid Number of MPI processes (and sub-domains) to be reduced O( )-way MPI might not scale in Exascale Systems Easily extended to Heterogeneous Architectures +GPU, +Manycores (e.g. Intel MIC/Xeon Phi) MPI+X: OpenMP, OpenACC, CUDA, OpenCL
59 59 This class is also useful for this type of parallel system GPU Manycore GPU Manycore GPU Manycore GPU Manycore GPU Manycore
60 60 Parallel Programming Models Multicore Clusters (e.g. K, FX10) MPI + OpenMP and (Fortan/C/C++) Multicore + GPU (e.g. Tsubame) GPU needs host MPI and [(Fortan/C/C++) + CUDA, OpenCL] complicated, MPI and [(Fortran/C/C++) with OpenACC] close to MPI + OpenMP and (Fortran/C/C++) Multicore + Intel MIC/Xeon-Phi Intel MIC/Xeon-Phi Only MPI + OpenMP and (Fortan/C/C++) is possible + Vectorization
61 61 Future of Supercomputers (1/2) Technical Issues Power Consumption Reliability, Fault Tolerance, Fault Resilience Scalability (Parallel Performancce) Petascale System 2MW including A/C, 2M$/year, O(10 5 ~10 6 ) cores Exascale System (10 3 x Petascale) After 2021 (?) 2GW (2 B$/year!), O(10 8 ~10 9 ) cores Various types of innovations are on-going to keep power consumption at 20MW (100x efficiency), Memory, Network... Reliability
62 62 Future of Supercomputers (2/2) Not only hardware, but also numerical models and algorithms must be improved: 省電力 (Power-Aware/Reducing Algorithms) 耐故障 (Fault Resilient Algorithms) 通信削減 (Communication Avoiding/Reducing Algorithms) Co-Design by experts from various area (SMASH) is important Exascale system will be a special-purpose system, not a generalpurpose one.
Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction
duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima Information Technology Center The University of Tokyo 2 Motivation for Parallel Computing (and this class) Large-scale
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters
duction to Parallel Programming for Multi/Many Clusters General duction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri Information Technology
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Introduction
duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima & Tetsuya Hoshino Information Technology Center The University of Tokyo Motivation for Parallel Computing 2 (and
More informationKengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA
ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima Information Technology
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters
Introduction to Parallel Programming for Multicore/Manycore Clusters General Introduction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri
More informationOverview of Reedbush-U How to Login
Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 11 12 13 14 15 16 17 18
More informationSupercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle
Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationIntroduction to Parallel Programming for Multicore/Manycore Clusters Part II-3: Parallel FVM using MPI
Introduction to Parallel Programming for Multi/Many Clusters Part II-3: Parallel FVM using MPI Kengo Nakajima Information Technology Center The University of Tokyo 2 Overview Introduction Local Data Structure
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More information2018/9/25 (1), (I) 1 (1), (I)
2018/9/25 (1), (I) 1 (1), (I) 2018/9/25 (1), (I) 2 1. 2. 3. 4. 5. 2018/9/25 (1), (I) 3 1. 2. MPI 3. D) http://www.compsci-alliance.jp http://www.compsci-alliance.jp// 2018/9/25 (1), (I) 4 2018/9/25 (1),
More informationParallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationOverview of Reedbush-U How to Login
Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More information3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis
3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Programming for Parallel Computing (616-2057) Seminar on Advanced Computing (616-4009) ppopen HPC Open Source Infrastructure
More informationECE 574 Cluster Computing Lecture 2
ECE 574 Cluster Computing Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 24 January 2019 Announcements Put your name on HW#1 before turning in! 1 Top500 List November
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationCommunication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters
Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa Information Technology Center,
More informationHPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek
1 HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek HPCG Snapshot High Performance Conjugate Gradient (HPCG). Solves Ax=b, A large, sparse, b known, x computed. An optimized implementation
More informationSystem Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited
System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited 2018 IEEE 68th Electronic Components and Technology Conference San Diego, California May 29
More informationMultigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX10
Multigrid Method using OpenMP/MPI Hybrid Parallel Programming Model on Fujitsu FX0 Kengo Nakajima Information Technology enter, The University of Tokyo, Japan November 4 th, 0 Fujitsu Booth S Salt Lake
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationAccelerated Computing Activities at ITC/University of Tokyo
Accelerated Computing Activities at ITC/University of Tokyo Kengo Nakajima Information Technology Center (ITC) The University of Tokyo ADAC-3 Workshop January 25-27 2017, Kashiwa, Japan Three Major Campuses
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationHPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek
www.hpcg-benchmark.org 1 HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek www.hpcg-benchmark.org 2 HPCG Snapshot High Performance Conjugate Gradient (HPCG). Solves Ax=b, A large, sparse,
More informationChapter 5 Supercomputers
Chapter 5 Supercomputers Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers
More informationHigh-Performance Computing - and why Learn about it?
High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA Outline What is High-Performance Computing? Why is High-Performance Computing
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationCSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1
Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of
More informationOverview of HPC and Energy Saving on KNL for Some Computations
Overview of HPC and Energy Saving on KNL for Some Computations Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 1/2/217 1 Outline Overview of High Performance
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 11/20/13 1 Rank Site Computer Country Cores Rmax [Pflops] % of Peak Power [MW] MFlops /Watt 1 2 3 4 National
More informationConfessions of an Accidental Benchmarker
Confessions of an Accidental Benchmarker http://bit.ly/hpcg-benchmark 1 Appendix B of the Linpack Users Guide Designed to help users extrapolate execution Linpack software package First benchmark report
More informationExperiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi
Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Kengo Nakajima Supercomputing Research Division Information Technology
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationParallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationThe next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency
The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More information3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis
3D Parallel FEM (III) Parallel Visualization using ppopen-math/vis Kengo Nakajima Technical & Scientific Computing I (4820-1027) Seminar on Computer Science I (4810-1204) Parallel FEM ppopen-hpc Open Source
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationEuropean Processor Initiative & RISC-V
European Processor Initiative & RISC-V Prof. Mateo Valero BSC Director 9/May/2018 RISC-V Workshop, Barcelona Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives Supercomputing
More informationHybrid Architectures Why Should I Bother?
Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationHigh-Performance Scientific Computing
High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org
More informationIterative Linear Solvers for Sparse Matrices
Iterative Linear Solvers for Sparse Matrices Kengo Nakajima Information Technology Center, The University of Tokyo, Japan 2013 International Summer School on HPC Challenges in Computational Sciences New
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationThe Future of High- Performance Computing
Lecture 26: The Future of High- Performance Computing Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Comparing Two Large-Scale Systems Oakridge Titan Google Data Center Monolithic
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationDesigning High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning
5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationINSPUR and HPC Innovation
INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community
More informationReport on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory
Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 24, 2016 University of Tennessee Department of Electrical Engineering and Computer Science
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationPerformance and Energy Usage of Workloads on KNL and Haswell Architectures
Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research
More informationIntroduction of Oakforest-PACS
Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory
Jack Dongarra University of Tennessee Oak Ridge National Laboratory 3/9/11 1 TPP performance Rate Size 2 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 100 Tflop/s 100000 10 Tflop/s 10000
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationECE 574 Cluster Computing Lecture 18
ECE 574 Cluster Computing Lecture 18 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 2 April 2019 HW#8 was posted Announcements 1 Project Topic Notes I responded to everyone s
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationIntroduction to High-Performance Computing
Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance
More informationTrends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27
Trends of Network Topology on Supercomputers Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 From Graph Golf to Real Interconnection Networks Case 1: On-chip Networks Case 2: Supercomputer
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationAmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015
AmgX 2.0: Scaling toward CORAL Joe Eaton, November 19, 2015 Agenda Introduction to AmgX Current Capabilities Scaling V2.0 Roadmap for the future 2 AmgX Fast, scalable linear solvers, emphasis on iterative
More informationPerformance of deal.ii on a node
Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions
More informationOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
On Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators Karl Rupp, Barry Smith rupp@mcs.anl.gov Mathematics and Computer Science Division Argonne National Laboratory FEMTEC
More informationin Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity
Fujitsu High Performance Computing Ecosystem Human Centric Innovation in Action Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Innovation Flexibility Simplicity INTERNAL USE ONLY 0 Copyright
More informationWhat have we learned from the TOP500 lists?
What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots
More informationOverview and history of high performance computing
Overview and history of high performance computing CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Overview and history of high performance computing Spring 2018 1
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationTOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology
TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationHigh Performance Computing in Europe and USA: A Comparison
High Performance Computing in Europe and USA: A Comparison Hans Werner Meuer University of Mannheim and Prometeus GmbH 2nd European Stochastic Experts Forum Baden-Baden, June 28-29, 2001 Outlook Introduction
More informationEmerging Heterogeneous Technologies for High Performance Computing
MURPA (Monash Undergraduate Research Projects Abroad) Emerging Heterogeneous Technologies for High Performance Computing Jack Dongarra University of Tennessee Oak Ridge National Lab University of Manchester
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationBig Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid Architectures
Procedia Computer Science Volume 51, 2015, Pages 2774 2778 ICCS 2015 International Conference On Computational Science Big Data Analytics Performance for Large Out-Of- Core Matrix Solvers on Advanced Hybrid
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationINSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM
INSPUR and HPC Innovation Dong Qi (Forrest) Oversea PM dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community Inspur
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationTowards Exascale Computing with the Atmospheric Model NUMA
Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey
More informationCUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters
CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan and Dhabaleswar K. (DK) Panda Speaker: Sourav Chakraborty
More informationFabio AFFINITO.
Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful
More informationFault Tolerance Techniques for Sparse Matrix Methods
Fault Tolerance Techniques for Sparse Matrix Methods Simon McIntosh-Smith Rob Hunt An Intel Parallel Computing Center Twitter: @simonmcs 1 ! Acknowledgements Funded by FP7 Exascale project: Mont Blanc
More informationIntroduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration
Exascale Applications and Software Conference 21st 23rd April 2015, Edinburgh, UK Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration Xue-Feng
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More information