2018/9/25 (1), (I) 1 (1), (I)

Size: px
Start display at page:

Download "2018/9/25 (1), (I) 1 (1), (I)"

Transcription

1 2018/9/25 (1), (I) 1 (1), (I)

2 2018/9/25 (1), (I)

3 2018/9/25 (1), (I) MPI 3. D)

4 2018/9/25 (1), (I) 4

5 2018/9/25 (1), (I) GPU

6 2018/9/25 (1), (I) () l l l () l l l l l l l l l l RB-H 6

7 2018/9/25 (1), (I) 7 1. l l l 2. l l

8 2018/9/25 (1), (I) (28) S1S (28) A1A (29) S1S : (29) A1A : 1: (30) S1S : 15: 8: 052

9 2018/9/25 (1), (I) 9 Fortran 50

10 2018/9/25 (1), (I) 10 Oakforest-PACS 109 GPU (Reedbush-H)

11 2018/9/25 (1), (I) 11 l PDF

12 2018/9/25 (1), (I) 12

13 2018/9/25 (1), (I) 13

14 2018/9/25 (1), (I) 14

15 2018/9/25 (1), (I) 15

16 2018/9/25 (1), (I) FX10 2. PCMPI 3. PCMPI

17 2018/9/25 (1), (I) 17

18 2018/9/25 (1), (I) 18 PC =>=> (26331) I 3. 50TFOPS ReedbushOakforest- PACS GPU!!

19 2018/9/25 (1), (I) 19 (1) CPU MPP (Massively Parallel Processor) PRIMEHPC-FXCray XC (): TOFU, Cray Aries ( ) 1 Intel OmniPath, InfiniBand Ethernet NEC SX SMP (Symmetric Multi Processor) HP (SGI) UV256CPU

20 2018/9/25 (1), (I) 20 (2) GPU (NVIDIA Tesla) PEZY-SC2 Intel Xeon Phi (Knights Corner) NEC SX-Aurora TSUBASA PCI Express => ITCGPU Reedbush-

21 2018/9/25 (1), (I) 21 TFOPS Tera Floating Point Operations Per Second FOPS K,M,,G,,, T,,,, PFOPS PFOPS l PC l 4.2GH42 4.2GFOPS l Intel Core i7 (Skylake) GHz * 16/Hz * 4 = GFOPS l Cray-160MFOPS PC1680

22 2018/9/25 (1), (I) 22 Theoretical Peak Performance FOPS Effective Performance FOPS INPACK (CG HPCG

23 2018/9/25 (1), (I) 23 INPACK 500 () 410 inpack (Taihuight) 2(Tianhe-2) Sequoia Titan 500

24 2018/9/25 (1), (I) 24 Intel

25 2018/9/25 (1), (I) 25 (1) TOP500http:// INPACK 500 Jack Dongarra 500

26 2018/9/25 (1), (I) 26 (2) Green500http:// Top500 inpack/=fops/w HPCG ( (CG) inpack

27 2018/9/25 (1), (I) 27 (3) Graph500http://graph500.org/ TEPS (Traversed Edges Per Second) (BFS)2017/11 SSSP (Single Source Shortest Paths) Green Graph500 IO500 ( (IOPS) (GB/sec) 2017/11

28 2018/9/25 (1), (I) th TOP500 ist (June, 2018) R max: Performance of inpack (TFOPS) R peak : Peak Performance (TFOPS), Power: kw Site Computer/Year Vendor Cores R max (TFOPS) R peak (TFOPS) Power (kw) Summit, 2018, USA DOE/SC/Oak Ridge National aboratory Sunway Taihuight, 2016, China National Supercomputing Center in Wuxi Sieera, 2018, USA DOE/NNSA/N Tianhe-2A, 2018, China National Super Computer Center in Guangzhou ABCI (AI Bridging Cloud Infrastructure), 2018, Japan National Institute of Advanced Industrial Science and Technology (AIST) Piz Daint, 2017, Switzerland Swiss National Supercomputing Centre (CSCS) Titan, 2012, USA DOE/SC/Oak Ridge National aboratory Sequoia, 2011, USA DOE/NNSA/N IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband Sunway MPP, Sunway SW C 1.45GHz, Sunway IBM Power System S922C, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000 PRIMERGY CX2550 M4, Xeon Gold C 2.4GHz, NVIDIA Tesla V100 SXM2, Infiniband EDR Cray XC50, Xeon E5-2690v3 12C 2.6GHz, Aries interconnect, NVIDIA Tesla P100 Cray XK7, Opteron C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x 2,282, ,300 (= PF) 187,659 8,806 10,649,600 93, ,436 15,371 1,572,480 71, ,194 4,981,760 61, ,679 18, ,680 19,880 32,577 1, ,760 19,590 25,326 2, ,640 17,590 27,113 8,209 BlueGene/Q, Power BQC 16C 1.60 GHz, Custom 1,572,864 17,173 20,133 7,890 9 Trinity, 2017, USA DOE/NNSA/AN/SN Cray XC40, Intel Xeon Phi C 1.4GHz, Aries interconnect 979,968 14,137 43,903 3, Cori, 2016, Japan DOE/SC/BN/NERSC Cray XC40, Intel Xeon Phi C 1.4GHz, Aries interconnect 622,336 14,016 27,881 3, Oakforest-PACS, 2016, Japan Joint Center for Advanced High Performance Computing PRIMERGY CX1640 M1, Intel Xeon Phi C 1.4GHz, Intel Omni-Path 556,104 13,556 24,913 2,719

29 2018/9/25 (1), (I) 29 HPCG Ranking (June, 2018) Computer Cores HP Rmax TOP500 HPCG (Pflop/s) Rank (Pflop/s) Peak 1 Summit 2,392, % 2 Sierra 835, % 3 K computer 705, % 4 Trinity 979, % 5 Piz Daint 361, % 6 Sunway Taihuight 10,649, % 7 Oakforest-PACS 557, % 8 Cori 632, % 9 Tera , % 10 Sequoia 1,572, %

30 2018/9/25 (1), (I) 30 Green 500 Ranking (June, 2018) TOP 500 Rank System Cores HP Rmax (Pflop/s) Power (MW) GFOPS/W Shoubu system B, Japan 794, Suiren2, Japan 762, Sakura, Japan 794, DGX SaturnV Volta, USA 22,440 1, Summit, USA 2,282, ,300. 8, TSUBAME3.0, Japan 135,828 8, AIST AI Cloud, Japan 23, ABCI, Japan 391,680 19,880. 1, MareNostrum P9 CTE, Spain 19,440 1, RAIDEN GPU, Japan 35,360 1, Reedbush-, U.Tokyo, Japan 16, Reedbush-H, U.Tokyo, Japan 17,

31 2018/9/25 (1), (I) 31 IO 500 Ranking (Nov., 2017) Site Computer File system Client nodes IO500 Score BW (GiB/s) MD kiop/s) 1 JCAHPC, Japan Oakforest-PACS DDN IME KAUST, Saudi Shaheen2 Cray DataWarp KAUST, Saudi Shaheen2 ustre JSC, Germany JURON BeeGFS DKRZ, Germany Mistral ustre IBM, USA Sonasad Spectrum Scale Fraunhofer, Germany Seislab BeeGFS PNN, USA EMS Cascade ustre SN, USA Serrano Spectrum Scale

32 2018/9/25 (1), (I) 32 Top l 16 R-CCS: 10.5 PFOPS l 19 : TSUBAME PFOPS l 25, PFOPS x2 l 32 ITO 4.54 PFOPS l 50 JAXA: SORA-MA 3.15 PFOPS l 54 : Camphor PFOPS l 56 : FX PFOPS l 61 : JFRS PFOPS l 77 : FX PFOPS l 83 : ATERUI II 2.08 PFOPS l 180 : Sekirei PFOPS l 411 : Reedbush PFOPS l 414 : Reedbush-H PFOPS l 436 : Sekirei-ACC PFOPS

33 2018/9/25 (1), (I) 33 Sunway Taihulight (Wuxi) (NRCPC) PF, inpack 93.0 PF, Sunway SW (1+64)*4, 1.45GHz 3.06TF GB/s InfiniBand FDR Top500, HPCWire Japan, PCwatch

34 2018/9/25 (1), (I) 34 Piz CSCS ETH Zurich 33.8 PF, inpack 19.5 PF (2017 upgrade) 5,320 (P100 + Xeon Haswell) + 1,431 Xeon Broadwell Cray XC50 + XC40

35 2018/9/25 (1), (I) 35 NERSC NERSC: (DoE) (BN)1 National Energy Research Scientific Computing Center 9,688 Intel Xeon Phi (KN), 30 PF + 2,388 Intel Xeon (Haswell) Cray XC40 Gerty Cori:

36 2018/9/25 (1), (I) 36 R-CCS SPARC64 VIIIfx(CPU 128GFOPS) TOP500INPACK POPS PFOPS

37 2018/9/25 (1), (I) NEC SX-ACE 5,120 4.PFOPS 1.3PB/sec

38 2018/9/25 (1), (I) 38 TSUBAME3.0 HPE ICE-XA CPU: Intel Xeon E5-2680v4 2.4 GHz (14 cores) x 2 (Hyperthreading enabled) GPU: NVIDIA Tesla P100 x 4 Memory: 256GB 540

39 2018/9/25 (1), (I) FY Yayoi: Hitachi SR16000/M1 IBM Power TFOPS, 11.2 TB T2K Tokyo 140TF, 31.3TB Oakforest-PACS Fujitsu, Intel KN 25PFOPS, 919.3TB 39 Big Data & Extreme Computing Oakleaf-FX: Fujitsu PRIMEHPC FX10, SPARC64 IXfx 1.13 PFOPS, 150 TB BDEC System 50+ PFOPS (?) Oakbridge-FX TFOPS, 18.4 TB Reedbush, HPE Broadwell + Pascal 1.93 PFOPS Reedbush- HPE 1.43 PFOPS Oakbridge-II Intel/AMD/P9/ARM CPU only 5-10 PFOPS

40 2018/9/25 (1), (I) 40 2 (or 4) Oakleaf-FX ( PRIMEHPC FX10) PF,, Oakbridge-FX ( PRIMEHPC FX10) TF, 168, Reedbush (HPE, Intel BDW + NVIDIA P100 (Pascal)) ITCGPU, DDN IME (Burst Buffer) Reedbush-U: CPU only, 420 nodes, 508 TF (20167) Reedbush-H: 120 nodes, 2 GPUs/node: 1.42 PF (20173) RB Reedbush-: 64 nodes, 4 GPUs/node: 1.43 PF (201710) Oakforest-PACS (OFP) (Intel Xeon Phi (KN))

41 2018/9/25 (1), (I) 41 Reedbush Top500: RB RB-H 2017 RB-U 2016 Green500: RB RB-H 2017 Reedbush-U Reedbush-H Reedbush

42 2018/9/25 (1), (I) 42

43 2018/9/25 (1), (I) 43 Reedbush CPU/node GPU - Reedbush-U Reedbush-H Reedbush- Intel Xeon E5-2695v4 (Broadwell-EP, 2.1GHz, 18core) x 2 sockets (1.210 TF), 256 GiB (153.6GB/sec) NVIDIA Tesla P100 (Pascal, 5.3TF, 720GB/sec, 16GiB) Infiniband EDR FDR2ch EDR2ch GPU (=1202) 256 (=644) (TFOPS) (TB/sec) ,417 ( ,272) ( ) 1,433 ( ,358) ( )

44 2018/9/25 (1), (I) 44 Oakforest-PACS (OFP) ,208 Intel Xeon/Phi (KN)25PFOPS TOP HPCG HPC (JCAHPC: Joint Center for Advanced High Performance Computing)

45 2018/9/25 (1), (I) Oakforest-PACS

46 2018/9/25 (1), (I) Oakforest-PACS の特徴 (1/2) 1 683TFOPS8, PFOPS MCDRAM16GB DDR496GB Fat-Tree Intel Omni-Path Architecture

47 2018/9/25 Oakforest-PACS (1), (I) ,208 Product 25 PFOPS PRIMERGY CX600 M1 (2U) + CX1640 M1 x 8node Intel Xeon Phi 7250 : Knights anding GHz 16 GB, MCDRAM, 490 GB/sec Product 96 GB, DDR4-2400, GB/sec Intel Omni-Path Architecture 100 Gbps Fat-tree

48 2018/9/25 (1), (I) 48 Oakforest-PACS 2 / 2 48 I/O : ustre 26PB DDN IME 1TB/sec, 1PB Green 5006 inpack 2.72 MW 4,986 MFOPS/WOFP 830 MFOPS/W 120

49 2018/9/25 (1), (I) 49 Oakforest-PACS 49 Type Product Type Product 102 ustre File System 26.2 PB DataDirect Networks SFA14KE 500 GB/sec Burst Buffer, Infinite Memory Engine (by DDN) 940 TB (NVMe SSD, ) DataDirect Networks IME14K 1,560 GB/sec 4.2MW

50 2018/9/25 (1), (I) Oakforest-PACS OS: Red Hat Enterprise inux () CentOS McKernel () McKernel: AICSOS inuxinux GCC, Intel Compiler, XcalableMP XcalableMP: AICS CFortran ppopen-hpc, OpenFOAM, ABINIT-MP, PHASE system, FrontFlow/blue, APACK, ScaAPACK, PETSc, METIS, SuperU etc.

51 2018/9/25 (1), (I) 51 Core #0 Core #1 FX10 Core #2 Core #3 Core #12 Core #13 Core # : 132KB GB/ Core #15 TOFU Network ICC 2 (1612MB) 85GB/ =(8Byte1333MHz 8 channel) Memory Memory Memory Memory DDR3 DIMM 4GB 2 4GB 2 4GB 2 4GB 2 8GB432GB

52 2018/9/25 (1), (I) FX10TOFU TOFU 1TOFU 6 5GB/ 52

53 TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU FX10TOFU 2018/9/25 (1), (I) 53 TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU TOFU l XYZ 1TOFU TOFU l l X l Y l Z

54 2018/9/25 (1), (I) 54 Reedbush-U => NUMA (Non-Uniform Memory Access) (FX10) 128GB DDR4 DDR4 DDR4 DDR4 76.8GB/s Intel Xeon E v4 (Broadwell- EP) G3 x GB/s 76.8GB/s QPI QPI 76.8GB/s Intel Xeon E v4 (Broadwell- EP) DDR4 DDR4 DDR4 DDR4 76.8GB/s 128GB IB EDR HCA

55 2018/9/25 (1), (I) 55 Broadwell-EP QPI x2 PCIe 1: 2KB, 2: 256KB, 3: 2.5MB() => 3 45MB Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # Core # DDR4 DIMM Memory Memory Memory Memory 16GB 2 16GB 2 16GB 2 16GB 2 16GB8128GB 76.8 GB/ =(8Byte2400MHz4 channel)

56 2018/9/25 (1), (I) 56 Reedbush-U Fat Tree Mellanox InfiniBand EDR 4x CS7500: (SB7800) (36+18) RB-H1RB- Uplink: 18 Downlink: Director Spine eaf

57 2018/9/25 (1), (I) G B DDR4 DDR4 DDR4 DDR4 76.8GB/s Reedbush-H Intel Xeon E v4 (Broadwell- EP) G3 x GB/s QPI QPI 76.8GB/s 15.7 GB/s 15.7 GB/s Intel Xeon E v4 (Broadwell- EP) G3 x16 DDR4 DDR4 DDR4 DDR4 76.8GB/s 128G B PCIe sw PCIe sw IB FDR HCA G3 x16 G3 x16 NVIDIA Pascal 20 GB/s NVinK NVinK 20 GB/s G3 x16 NVIDIA Pascal G3 x16 IB FDR HCA EDR switch EDR

58 2018/9/25 (1), (I) 58 Oakforest-PACS Intel Xeon Phi (Knights anding) 11 MCDRAM: 16GB + DDR4 16GB696GB MCDRAM: 490GB/ DDR4: GB/ =(8Byte2400MHz6 channel) 2 VPU 2 VPU Core 1MB 2 Core HotChips27 Knights anding Overview 3 D D R 4 C H A N N E S MCDRAM MCDRAM MCDRAM MCDRAM EDC DDR MC EDC Tile 2 x16 1 x4 PCIe Gen 3 36 Tiles connected by 2D Mesh Interconnect Package X4 DMI KN D M I EDC EDC DDR MC EDC EDC misc EDC EDC MCDRAM MCDRAM MCDRAM MCDRAM 3 D D R 4 C H A N N E S

59 2018/9/25 (1), (I) 59 Oakforest-PACS: Intel Omni-Path Architecture Fat-tree 768 port Director Switch 12 (Source by Intel) Uplink: 24 2 Downlink: port Edge Switch 362 2

60 2018/9/25 (1), (I) 60 Oakforest-PACS 100,000 18() ,000 ( 480,000) Reedbush

61 2018/9/25 (1), (I) 61 Reedbush ,000 RB-U: 416 RB-H: 1 2.5x RB-: 1 4.0x 300, RB-H 1U2.5 RB- 1U4.0 RB-U 360, RB-H 216, RB- 360, (H2.5, 4.0) 2 Oakforest-PACS

62 2018/9/25 (1), (I) 62 JPY (=Watt)/GFOPS Rate Smaller is better (efficient) System Oakleaf/Oakbridge-FX (Fujitsu) (Fujitsu PRIMEHPC FX10) Reedbush-U (SGI) (Intel BDW) Reedbush-H (SGI) (Intel BDW+NVIDIA P100) Oakforest-PACS (Fujitsu) (Intel Xeon Phi/Knights anding) JPY/GFOPS

63 2018/9/25 (1), (I) 63 RB-U90% %10 OFP MW

64 2018/9/25 (1), (I) 64 T T / T T /

65 2018/9/25 (1), (I) 65 MPI Message Passing Interface TCP/IP Massively Parallel Processing (MPP) APIApplication Programming Interface

66 2018/9/25 (1), (I) 66 Oakforest-PACS

67 2018/9/25 (1), (I) 67

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle

Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 Hitachi SR11K/J2 IBM Power 5+ 18.8TFLOPS, 16.4TB Hitachi HA8000 (T2K) AMD Opteron 140TFLOPS, 31.3TB

More information

Overview of Reedbush-U How to Login

Overview of Reedbush-U How to Login Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 08 09 10 11 12 13 14 15

More information

Overview of Reedbush-U How to Login

Overview of Reedbush-U How to Login Overview of Reedbush-U How to Login Information Technology Center The University of Tokyo http://www.cc.u-tokyo.ac.jp/ Supercomputers in ITC/U.Tokyo 2 big systems, 6 yr. cycle FY 11 12 13 14 15 16 17 18

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

Towards Efficient Communication and I/O on Oakforest-PACS: Large-scale KNL+OPA Cluster

Towards Efficient Communication and I/O on Oakforest-PACS: Large-scale KNL+OPA Cluster Towards Efficient Communication and I/O on Oakforest-PACS: Large-scale KNL+OPA Cluster Toshihiro Hanawa Joint Center for Advanced HPC (JCAHPC) Information Technology Center, the University of Tokyo 201/0/0

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

Accelerated Computing Activities at ITC/University of Tokyo

Accelerated Computing Activities at ITC/University of Tokyo Accelerated Computing Activities at ITC/University of Tokyo Kengo Nakajima Information Technology Center (ITC) The University of Tokyo ADAC-3 Workshop January 25-27 2017, Kashiwa, Japan Three Major Campuses

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Seagate ExaScale HPC Storage

Seagate ExaScale HPC Storage Seagate ExaScale HPC Storage Miro Lehocky System Engineer, Seagate Systems Group, HPC1 100+ PB Lustre File System 130+ GB/s Lustre File System 140+ GB/s Lustre File System 55 PB Lustre File System 1.6

More information

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters

Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore/Manycore Clusters Kengo Nakajima, Toshihiro Hanawa Information Technology Center,

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

High-Performance Computing - and why Learn about it?

High-Performance Computing - and why Learn about it? High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA Outline What is High-Performance Computing? Why is High-Performance Computing

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek

HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek 1 HPCG UPDATE: SC 15 Jack Dongarra Michael Heroux Piotr Luszczek HPCG Snapshot High Performance Conjugate Gradient (HPCG). Solves Ax=b, A large, sparse, b known, x computed. An optimized implementation

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1 Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of

More information

HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek

HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek www.hpcg-benchmark.org 1 HPCG UPDATE: ISC 15 Jack Dongarra Michael Heroux Piotr Luszczek www.hpcg-benchmark.org 2 HPCG Snapshot High Performance Conjugate Gradient (HPCG). Solves Ax=b, A large, sparse,

More information

ECE 574 Cluster Computing Lecture 2

ECE 574 Cluster Computing Lecture 2 ECE 574 Cluster Computing Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 24 January 2019 Announcements Put your name on HW#1 before turning in! 1 Top500 List November

More information

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 24, 2016 University of Tennessee Department of Electrical Engineering and Computer Science

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multi/Many Clusters General duction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri Information Technology

More information

Oakforest-PACS and PACS-X: Present and Future of CCS Supercomputers

Oakforest-PACS and PACS-X: Present and Future of CCS Supercomputers Oakforest-PACS and PACS-X: Present and Future of CCS Supercomputers Taisuke Boku Deputy Director, Center for Computational Sciences University of Tsukuba 1 Oakforest-PACS 2 JCAHPC Joint Center for Advanced

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction duction to Parallel Programming for Multicore/Manycore Clusters Introduction Kengo Nakajima & Tetsuya Hoshino Information Technology Center The University of Tokyo Motivation for Parallel Computing 2 (and

More information

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014 InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance

More information

HOKUSAI System. Figure 0-1 System diagram

HOKUSAI System. Figure 0-1 System diagram HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application

More information

Overview of HPC and Energy Saving on KNL for Some Computations

Overview of HPC and Energy Saving on KNL for Some Computations Overview of HPC and Energy Saving on KNL for Some Computations Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 1/2/217 1 Outline Overview of High Performance

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents

More information

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED Fujitsu Petascale Supercomputer PRIMEHPC FX10 4x2 racks (768 compute nodes) configuration PRIMEHPC FX10 Highlights Scales up to 23.2 PFLOPS Improves Fujitsu s supercomputer technology employed in the FX1

More information

Oak Ridge National Laboratory Computing and Computational Sciences

Oak Ridge National Laboratory Computing and Computational Sciences Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman

More information

Chapter 5 Supercomputers

Chapter 5 Supercomputers Chapter 5 Supercomputers Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA

Kengo Nakajima Information Technology Center, The University of Tokyo. SC15, November 16-20, 2015 Austin, Texas, USA ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta Scale Supercomputers with Automatic Tuning (AT) Kengo Nakajima Information Technology

More information

IBM CORAL HPC System Solution

IBM CORAL HPC System Solution IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy

More information

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr 19. prosince 2018 CIIRC Praha Milan Král, IBM Radek Špimr CORAL CORAL 2 CORAL Installation at ORNL CORAL Installation at LLNL Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters duction to Parallel Programming for Multicore/Manycore Clusters duction Kengo Nakajima Information Technology Center The University of Tokyo http://nkl.cc.u-tokyo.ac.jp/18s/ 2 Descriptions of the Class

More information

European Processor Initiative & RISC-V

European Processor Initiative & RISC-V European Processor Initiative & RISC-V Prof. Mateo Valero BSC Director 9/May/2018 RISC-V Workshop, Barcelona Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives Supercomputing

More information

HPC Technology Update Challenges or Chances?

HPC Technology Update Challenges or Chances? HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter

More information

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements

More information

System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited

System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited 2018 IEEE 68th Electronic Components and Technology Conference San Diego, California May 29

More information

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017 Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle

More information

Introduction to Parallel Programming for Multicore/Manycore Clusters

Introduction to Parallel Programming for Multicore/Manycore Clusters Introduction to Parallel Programming for Multicore/Manycore Clusters General Introduction Invitation to Supercomputing Kengo Nakajima Information Technology Center The University of Tokyo Takahiro Katagiri

More information

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute

More information

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen

Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Jeffrey S. Vetter SPPEXA Symposium Munich 26 Jan 2016 ORNL is managed by UT-Battelle for the US Department of Energy

More information

LS-DYNA Performance Benchmark and Profiling. October 2017

LS-DYNA Performance Benchmark and Profiling. October 2017 LS-DYNA Performance Benchmark and Profiling October 2017 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: LSTC, Huawei, Mellanox Compute resource

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

PRACE Project Access Technical Guidelines - 19 th Call for Proposals PRACE Project Access Technical Guidelines - 19 th Call for Proposals Peer-Review Office Version 5 06/03/2019 The contributing sites and the corresponding computer systems for this call are: System Architecture

More information

Top500

Top500 Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1 2 MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance

More information

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 Power Systems AC922 Overview Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017 IBM POWER HPC Platform Strategy High-performance computer and high-performance

More information

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27

Trends of Network Topology on Supercomputers. Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 Trends of Network Topology on Supercomputers Michihiro Koibuchi National Institute of Informatics, Japan 2018/11/27 From Graph Golf to Real Interconnection Networks Case 1: On-chip Networks Case 2: Supercomputer

More information

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Jens Domke Research Staff at MATSUOKA Laboratory GSIC, Tokyo Institute of Technology, Japan Omni-Path User Group 2017/11/14 Denver,

More information

An approach to provide remote access to GPU computational power

An approach to provide remote access to GPU computational power An approach to provide remote access to computational power University Jaume I, Spain Joint research effort 1/84 Outline computing computing scenarios Introduction to rcuda rcuda structure rcuda functionality

More information

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

BeeGFS.   Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting BeeGFS The Parallel Cluster File System Container Workshop ISC 28.7.18 www.beegfs.io July 2018 Marco Merkel VP ww Sales, Consulting HPC & Cognitive Workloads Demand Today Flash Storage HDD Storage Shingled

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE , NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC

More information

Brand-New Vector Supercomputer

Brand-New Vector Supercomputer Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Introduction of Fujitsu s next-generation supercomputer

Introduction of Fujitsu s next-generation supercomputer Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of

More information

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters

More information

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

PLAN-E Workshop Switzerland. Welcome! September 8, 2016 PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing

IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing : A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier0) Contributing sites and the corresponding computer systems for this call are: GENCI CEA, France Bull Bullx cluster GCS HLRS, Germany Cray

More information

Overview of the Post-K processor

Overview of the Post-K processor 重点課題 9 シンポジウム 2019 年 1 9 Overview of the Post-K processor ポスト京システムの概要と開発進捗状況 Mitsuhisa Sato Team Leader of Architecture Development Team Deputy project leader, FLAGSHIP 2020 project Deputy Director, RIKEN

More information

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016 The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016 2016 Supermicro 15 Minutes Two Swim Lanes Intel Phi Roadmap & SKUs Phi in the TOP500 Use Cases Supermicro

More information

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming

More information

T2K & HA-PACS Projects Supercomputers at CCS

T2K & HA-PACS Projects Supercomputers at CCS T2K & HA-PACS Projects Supercomputers at CCS Taisuke Boku Deputy Director, HPC Division Center for Computational Sciences University of Tsukuba Two Streams of Supercomputers at CCS Service oriented general

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

GROMACS (GPU) Performance Benchmark and Profiling. February 2016 GROMACS (GPU) Performance Benchmark and Profiling February 2016 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Mellanox, NVIDIA Compute

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

The Future of High- Performance Computing

The Future of High- Performance Computing Lecture 26: The Future of High- Performance Computing Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Comparing Two Large-Scale Systems Oakridge Titan Google Data Center Monolithic

More information

HPC as a Driver for Computing Technology and Education

HPC as a Driver for Computing Technology and Education HPC as a Driver for Computing Technology and Education Tarek El-Ghazawi The George Washington University Washington D.C., USA NOW- July 2015: The TOP 10 Systems Rank Site Computer Cores Rmax [Pflops] %

More information

OpenFOAM Performance Testing and Profiling. October 2017

OpenFOAM Performance Testing and Profiling. October 2017 OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC

More information

Technology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ

Technology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ Technology evaluation at CSCS including BeeGFS parallel filesystem Hussein N. Harake CSCS-ETHZ Agenda CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS Case

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand

More information

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012 ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning Efficient and Scalable Multi-Source Streaming Broadcast on Clusters for Deep Learning Ching-Hsiang Chu 1, Xiaoyi Lu 1, Ammar A. Awan 1, Hari Subramoni 1, Jahanzeb Hashmi 1, Bracy Elton 2 and Dhabaleswar

More information

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015 LAMMPS-KOKKOS Performance Benchmark and Profiling September 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, NVIDIA

More information

Emerging Technologies for HPC Storage

Emerging Technologies for HPC Storage Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018 The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional

More information

Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ

Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake CSCS-ETHZ Agenda About CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS

More information

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive Cori (2016) and Beyond Ensuring NERSC Users Stay Productive Nicholas J. Wright! Advanced Technologies Group Lead! Heterogeneous Mul-- Core 4 Workshop 17 September 2014-1 - NERSC Systems Today Edison: 2.39PF,

More information

UCX: An Open Source Framework for HPC Network APIs and Beyond

UCX: An Open Source Framework for HPC Network APIs and Beyond UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation

More information

HPC Hardware Overview

HPC Hardware Overview HPC Hardware Overview John Lockman III April 19, 2013 Texas Advanced Computing Center The University of Texas at Austin Outline Lonestar Dell blade-based system InfiniBand ( QDR) Intel Processors Longhorn

More information

OpenPOWER Performance

OpenPOWER Performance OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack

More information

Intel Knights Landing Hardware

Intel Knights Landing Hardware Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute

More information