Sunway TaihuLight: The system and applications. Zhao Liu Director of Application Support Department National Supercomputing Center in Wuxi
|
|
- Sydney Warren
- 5 years ago
- Views:
Transcription
1 Sunway TaihuLight: The system and applications Zhao Liu Director of Application Support Department National Supercomputing Center in Wuxi
2 Outline Sunway Machine Applications and Programming Challenges Long Term Plans
3 Sunway TaihuLight 神威 太湖之光
4 Sunway TaihuLight City Rank in Top50 Shanghai 1 Suzhou 7 Wuxi 14 Nantong 24 Changzhou 34 Jiaxing 50
5 Sunway-I: - CMA service, commercial chip Tflops - 48 th of TOP500 Sunway BlueLight: - NSCC-Jinan, core processor - 1 Pflops - 14 th of TOP500 Sunway TaihuLight: - NSCC-Wuxi, core processor Pflops - 1 st of TOP500 The Sunway Machine Family
6 Overall Architecture of SW26010 MPE: management processing element 管理 MPE 核心 管理 MPE 核心 运算核 CPE 心簇 Clusters 运算核 CPE 心簇 Clusters CPE: computing processing element IMPE: intelligent memory processing element SI: system interface 智能存储核心 主存 智能存储核心 主存 片上网络 Network On Chip 智能存储核心 IMPE IMPE IMPE Main Memory Main Memory 主存 Main Memory 系统 SI 接口
7 communication row and column buses credit-based wormhole switch cluster controller The Architecture of the CPE Cluster streaming engine: processing of DMA instructions synchronization control: fast hardware synchronization for CPEs consistency processing unit: assuring cache coherency with the MPEs 运算核 CPE Clusters 心簇 Cluster 簇控制器 Controller... 运算运算运算运算 CPE CPE CPE CPE 核心核心核心核心... 运算运算运算运算 CPE CPE CPE CPE 核心核心核心核心... 运算运算运算运算 CPE CPE CPE CPE 核心核心核心核心... 运算运算运算运算 CPE CPE CPE CPE 核心核心核心核心 Cluster Communication 簇通信网络 Network Cluster Communication 簇通信网络接口 Network Interface CTLB L2 ICache 流传输引 Streaming Engine 擎 Network 片上网络接口 On Chip Interface Interruption 中断控制 Control Synchroniza 同步控制 tion Control 一致性处理 Consistency processing 部件 Unit Network On Chip 片上网络
8 The MPE microarchitecture MPE: L1 Instruction L1 指令 Cache Cache complete L1 and L2 caches PC Instruction 指令部件 Unit 3 pipes full support for management, service, and Pipe0 Execution 执行部件 Unit Pipe1 Pipe2 GPR computing functions mainly designed for running OS, runtime, and programs with less parallelism Memory 访存部件 Accessing Unit L1 Data L1 数据 Cache L2 Cache
9 The CPE microarchitecture CPE: L1 Instruction L1 指令 Cache Cache L1 cache for instruction only Scratch Pad Memory (i.e. Local Data Memory) 2 pipes On- 片上网络 Chip 路由器 NR Instruction PC 取指部件 Unit Execution 执行部件 Unit Pipe0 Pipe1 GPR mainly designed for highly parallel computation Memory 访存部件 Accessing Unit SPM/L1 数据 Data Cache
10 High-Density Integration of the Computing System A Five-Level Integration Hierarchy computing node computing board super node cabinet entire computing system
11 High-Density Integration of the Computing System A Five-Level Integration Hierarchy computing node computing board super node cabinet entire computing system
12 High-Density Integration of the Computing System A Five-Level Integration Hierarchy computing node computing board super node cabinet entire computing system
13 High-Density Integration of the Computing System A Five-Level Integration Hierarchy computing node computing board super node cabinet entire computing system
14 High-Density Integration of the Computing System A Five-Level Integration Hierarchy computing node computing board super node cabinet entire computing system
15 A System with Over 10 Million Cores
16 Outline Sunway Machine Applications and Programming Challenges Long Term Plans
17 Applications overview Over 200 applications in different HPC domains 15 full-scale applications More than 50 applications that can be scaled to half system More than 100 applications that can be scaled to one million cores Collaborate with more than 300 companies, research institutes and universities Porting open source applications to Sunway, including WRF, VASP, LAMMPS, etc. Develop highly efficient applications on Sunway
18 An (Incomplete) List of Full-Scale Applications 2016 Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
19 An (Incomplete) List of Full-Scale Applications 2016 Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires A graph processing application designed Run-away Electron for Trajectory local Internet Simulation companies in China Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
20 An (Incomplete) List of Full-Scale Applications 2016 Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Work of the Particle Simulator Genome Functional Annotation and Homeotic Research Team Gene Building of RIKEN AICS Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
21 An (Incomplete) List of Full-Scale Applications 2016 Gordon Bell Finalists Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
22 An (Incomplete) List of Full-Scale Applications 2016 Gordon Bell Prize Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
23 An (Incomplete) List of Full-Scale Applications 2016 Gordon Bell Prize Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Gordon Bell Finalists Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
24 An (Incomplete) List of Full-Scale Applications 2016 Gordon Bell Prize Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Gordon Bell Prize Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
25 Two Major Challenges The Memory Barrier The Application Barrier
26 Machine Capability Comparison TaihuLight Tianhe-2 Titan K Computer communication bandwidth Peak Performance Memory Size hpgmg TaihuLight Titan Linpack Tianhe-2 K Computer Graph memory bandwidth Gflops/Watt HPCG Tflops/m^3
27 Again, a Multi-Objective Optimization Problem with a Budget Constraint Peak /.1Pflops Mem / TB Price / Million USD Sunway TaihuLight Tianhe-2 Piz Daint Titan Sequoia K
28 Again, a Multi-Objective Optimization Problem with a Budget Constraint a budget machine with aggressive compute performance Peak /.1Pflops Mem / TB Price / Million USD Sunway TaihuLight Tianhe-2 Piz Daint Titan Sequoia K
29 Major Features to Consider Sunway TaihuLight 125 Pflops 10 million cores user-controlled 64 KB LDM 32 GB and 136GB/s per node 22 flops/byte MPE + CPE register communication among CPEs
30 Major Features to Consider Sunway TaihuLight 125 Pflops 10 million cores user-controlled 64 KB LDM 32 GB and 136GB/s per node 22 flops/byte MPE + CPE register communication among CPEs Intel KNL 7250 of Cori: NVIDIA P100 of Piz Daint: 6.5 flops/byte 7.2 flops/byte
31 Major Challenge : Memory Wall Sunway TaihuLight 125 Pflops 10 million cores user-controlled 64 KB LDM 32 GB and 136GB/s per node 22 flops/byte MPE + CPE register communication among CPEs Refactoring and Redesigning
32 MPI One MPI process runs on one management core (MPE) Sunway OpenACC Sunway OpenACC conducts data transfer between main memory and local data memory (LDM), and distributes the kernel workload to the computing cores (CPEs) Athread Programming Model on TaihuLight MPI + X X: (Sunway OpenACC / Athread) Athread is the threading library to manage threads on computing core (CPE), which is used in the Sunway OpenACC implementation
33 Two Major Challenges The Memory Barrier The Application Barrier
34 Example (I): Nonlinear Earthquake Simulation on Sunway TaihuLight Dynamic rupture source generator (originated from CG-FDM) Seismic wave propagation (originated from AWP-ODC) Other utilities: source partitioner 3D Model Interpolator Restart controller Fault Stress Init Dynamic Rupture Source Generator (Based on CG-FDM) Source Partitioner Seismic Wave Propagation (Based on AWP-ODC) Velocity Stress Update Update Next Timestep Stress Adjustment For Plasticity Friction Law Ctrl Source Injection Wave Eqn Solver 3D Vel/Den Model 3D Model Interpolator Snapshot/Sesimo Recorder Restart Controller LZ4 Compression, Group I/O, Balanced I/O Forwarding
35 Multi-Level Domain Decomposition
36 A Balanced Memory Scheme v 1 v 2 v n before v 1 v 2 v n after u 1 u 2 u w n 1 w 2 xx1 xx 2 yy1 yy 2 zz1 zz 2 w n xxn... yyn zzn u 1 u 2 u w n 1 w 2 u1 v1 w1 u2 v2 w2 fuse arrays w n xx1 xx 2 yy1 yy 2 zz1 zz 2 yz1 yz2 xxn... yyn zzn yzn (1) array fusion, (2) halo exchange through register communication, dvelcx yz1 yz2 yzn dvelcx xx1 yy1 zz1 xy1 xz1 yz1 xx 2yy 2 zz 2 xy 2 xz2 yz2 (3) and optimized blocking configuration guided by an analytical model v 1 v 2 v n u 1 u 2 un w 1 w 2 w n u1 v1 w1 u2 v2 w2 dstrqc dstrqc ID: 00 ID: 01 ID: 02 ID: 63 xx1 xx 2 xxn yz1 yz2 yy1 yy 2 zz1 zz 2... yzn zzn yyn xx1 yy1 zz1 xy1 xz1 yz1 xx 2yy 2 zz 2 xy 2 xz2 yz2 Left boundary Right boundary Inner part DMA transfer Register communication Register communication
37 Weak Scaling 18 Ideal (Linear) 14 Ideal (Non-linear) Ideal (Linear+Compress) 9 Ideal (Non-linear+Compress) 6 PFLOPS Linear (Peak: 10.7PFLops, Para. eff. 97.9%) Non-linear (Peak: 15.2PFlops, Para. eff. 80.1%) Linear+Compress (Peak: 14.2PFlops, Para. eff. 96.5%) Non-linear+Compress (Peak: 18.9PFlops, Para. eff. 79.5%) 8K 12K 16K 24K 32K 40K 48K 64K 80K 96K 120K 160K Number of processes
38 Weak Scaling PFLOPS 18 Ideal (Linear) 14 Ideal (Non-linear) Ideal (Linear+Compress) 9 Ideal (Non-linear+Compress) Peak Mem BW BYTE/flop Best Efficiency Linear (Peak: 10.7PFLops, Para. eff. 97.9%) 1 Titan 27 Pflops 5,475 TB/s Non-linear 0.2 half (Peak: machine: 15.2PFlops, 1.6 Pflops, Para. eff. 11.9% 80.1%) Linear+Compress (Peak: 14.2PFlops, Para. eff. 96.5%) 0.6 Non-linear+Compress (Peak: 18.9PFlops, Para. eff. 79.5%) Sunway without compression: 15.2 Pflops, 12.2% 8K K Pflops 16K4,473 TB/s24K 0.04 TaihuLight 32K 40K 48Kwith compression: 64K 80K K Pflops, 120K15.1% 160K Number of processes
39 Simulation Results
40 Two Major Challenges The Memory Barrier The Application Barrier
41 More and more component models marine biology ocean model oceanatmosphere boundary atmosphere model space weather ocean-ice boundary coupler land-atmosphere boundary atmospheric chemistry dynamic ice ice model ice-land boundary land model land biology solid earth hydrological process
42 The application barrier a high complexity in application, and a heavy legacy in the code base (millions lines of code) an extremely complicated MPMD program with no hotspots (or hundreds of hotspots) misfit between the in-place design philosophy and the new architecture lack of people with interdisciplinary knowledge and experience
43 Application (II): Porting CESM and Redesigning CAM-SE for Sunway TaihuLight CAM5.0 POP2.0 CPL7 CLM4.0 CICE4.0 CESM1.2.0 Tsinghua + BNU 30+ Professors and Students Four component models, millions lines of code Large-scale run on Sunway TaihuLight 24,000 MPI processes Over one million cores 10-20x speedup for kernels 2-3x speedup for the entire model Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer, in Proceedings of SC
44 OpenACC-based Refactoring of CAM Pass tracers (u, v) to dynamics CAM initial Dyn_run Phy_run1 Phy_run2 Pass state variables and tracers Pass state variables manual transformation of loops manual OpenACC parallelization and optimization on code and data structures Euler_step: do ie = nets, nete compute Q min/max values for lim8 compute Biharmonic mixing term f do ie = nets, nete 2D advection step data packing Bonundary exchange Data extracting 1 do ie = nets, nete do ie = nets, nete do k = 1, nlev do k = 1, nlev do q = 1, qsize do q = 1, qsize qmin(k,q,ie) = Qtens(k,q,ie) = qmax(k,q,ie) = Qtens(k,q,ie) = Data packing 4 do ie = nets, nete do ie = nets, nete do q = 1, qsize do q = 1, qsize do k = 1, nlev do k = 1, nlev qmin(k,q,ie) = Qtens(k,q,ie) = qmax(k,q,ie) = Qtens(k,q,ie) = Data packing 5!$ACC PARALLEL LOOP!$ACC PARALLEL LOOP do ie_q = 1, do ie_q = 1, qsize*(nete-nets) qsize*(nete-nets) do k = 1, nlev do k = 1, nlev q = func(ie_q) q = func(ie_q) ie = func(ie_q) ie = func(ie_q) qmin(k,q,ie) = Qtens(k,q,ie) = qmax(k,q,ie) = Qtens(k,q,ie) =!$ACC PARALLEL LOOP Data packing 6 do ie = nets, nete do ie = nets, nete do k = 1, nlev do q = 1, qsize dp(k) = func_1() do k = 1, nlev do q = 1, qsize. Qtens(k,q,ie) = func_2(dp(k)) do ie = nets, nete do ie = nets, nete do k = 1, nlev do k = 1, nlev dp0 = func_3() do q = 1, qsize dpdiss = func_4() qmin(k,q,ie) = do q = 1, qsize qmax(k,q,ie) = Qtens(k,q,ie) = func_5(dp0, dpdiss) do ie = nets, nete do k = 1, nlev do k = 1, nlev dp_star(k) = func_8(dp(k)) dp(k) = func_5() Vstar(k) = func_6() do k = 1, nlev Qtens(k,q,ie) = do q = 1, qsize func_9(dp_star(k)) do k = 1, nlev Qtens(k,q,ie) = func_7(dp(k), Vstar(k)) Data packing 2 do ie = nets, nete do ie = nets, nete do k = 1, nlev do q = 1, qsize do q = 1, qsize do k = 1, nlev Qtens(k,q,ie) = func_2(func_1()) optimized:. do ie = nets, nete do ie = nets, nete do k = 1, nlev do k = 1, nlev do q = 1, qsize do q = 1, qsize qmin(k,q,ie) = Qtens(k,q,ie) = qmax(k,q,ie) = func_5(func_3(),func_4()) do ie = nets, nete do k = 1, nlev do q = 1, qsize Qtens(k,q,ie) = do k = 1, nlev func_9(func_8(func_5())) Qtens(k,q,ie) = func_7(func_5(),func_6()) Data packing 3 do begin_chunk to end_chunk tphysbc() tphysbc() tphysbc() { { { do begin_chunk to end_chunk do begin_chunk to end_chunk convect_deep_tend(6.47%) convect_deep_tend(6.47%) convect_deep_tend(6.47%) convect_shallow_tend(15.57%) convect_shallow_tend(15.57%) enddo macrop_driver_tend(8.38%) macrop_driver_tend(8.38%) microp_aero_run(4.29%) microp_aero_run(4.29%) do begin_chunk to end_chunk microp_driver_tend(7.13%) microp_driver_tend(7.13%) microp_driver_tend(7.13%) aerosol_wet_intr(4.29%) aerosol_wet_intr(4.29%) enddo convect_deep_tend_2(0.51%) convect_deep_tend_2(0.51%) radiation_tend(54.07%) radiation_tend(54.07%) do begin_chunk to end_chunk } enddo radiation_tend(54.07%) enddo } enddo } convect_deep_tend(6.47%) { zm_conv_tend(6.47%) do begin_chunk to end_chunk { convect_deep_tend(6.47%) do begin_chunk to end_chunk { zm_convr(2.03%) zm_conv_tend(6.47%) enddo { do begin_chunk to end_chunk zm_convr(2.03%) zm_conv_evap() zm_conv_evap() enddo montran() do begin_chunk to end_chunk convtranc(0.06%) montran() } enddo } do begin_chunk to end_chunk enddo convtranc(0.06%) enddo } } tool based transformation of loops
45 Athread-based Fine-grained Redesign Step 1: rewrite of Fortran OpenACC code to Athread C code finer memory control through a specific DMA scheme more efficient vectorization elek C0,0 C7,0 C0, C7, elek+7 C0,7... C7,7 Step 2: register-communication based redesign remove data dependency expose more parallelism Stage 1 Ci, j a16*i a16*i+a16*i+1... a16*i+...+a16*i+15 Stage 2 C0,0 a0... a0+...+a15 C1,0 a16 a8+a9... a a31 Ck, 0 ak a0+a1 ak+ak+1... ak+...+ak p15 + p15 = p111 = p31 p127 Stage 3 Ci, j a16*i a16*i+a16*i+1... a16*i+...+a16*i+15 p16*i
46 1 Sunway CG (64 CPEs) could be equivalent to 0.1x Intel Core or 1.8x Intel Core or 7.2x Intel Core or in certain cases 43.1x Intel Core Performance Improvement through Redesign
47 Scaling the Dynamic Core to Millions of Cores PFlops elements in each process 192 elements in each process 768 elements in each process 650 elelments in each process 3.3 PFlops Para.eff 98.55% 2.4 PFlops Para.eff 92.2% 2.72 PFlops Para.eff 92.9% 1.76 PFlops Para.eff 88.3% Number of Processes
48 Simulation of Hurricane Katrina
49 Simulation of Hurricane Katrina
50 Outline Sunway Machine Applications and Programming Challenges Long Term Plans
51 Long Term Plan Traditional HPC Applications (Science -> Service) 52
52 Long Term Plan Traditional HPC Applications (Science -> Service) Deep Learning Related Applications Sunway Micro tanshan.gif 15-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of Realistic 10 Hz Scenarios, Gordon Bell Prize Finalist, SC
53 Long Term Plan Traditional HPC Applications (Science -> Service) Deep Learning Related Applications 54
54 Average time per iteration Long Term Plan Training AlexNet with swcaffe Traditional HPC Applications (Science -> Service) x 1.0x Deep Learning Related Applications total convolution fully connected 2.2x 2.7x? 3.5x 4.5x 1.0x 7.1x 8.5x Intel(24 cores) swblas(1cg) swdnn(1cg) 55
55 Long Term Plan Traditional HPC Applications (Science -> Service) Deep Learning Related Applications Sunway Micro 56
56 Long Term Plan Traditional HPC Applications (Science -> Service) Deep Learning Related Applications Sunway Micro 57
57 Long Term Plan Traditional HPC Applications (Science -> Service) Deep Learning Related Applications Sunway Micro Sunway Exa-Scale System 58
58 Sunway TaihuLight: - NSCC-Wuxi, core processor Pflops - 1 st of TOP500 Sunway Exa-Pilot System: ~ 10 Tflops per node - 10 ~ 20 Gflops/W Sunway Exa-Scale System ~ Pflops - 30 Gflops/W The Sunway Exa-Scale Plan
59
HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017
HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016
More informationChina s HPC development: a brief review and perspectives
China s HPC development: a brief review and perspectives Depei Qian Beihang University/Sun Yat-sen University International Symposium on Impact of extreme scale computing Tokyo, Japan Nov. 2, 2017 Outline
More informationThe State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology
The State and Opportunities of HPC Applications in China Ruibo Wang National University of Defense Technology Outline Brief introduction to the Sites Applications Fusion Development of HPC, Cloud & Big
More informationCHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer
CHAO YANG Dr. Chao Yang is a full professor at the Laboratory of Parallel Software and Computational Sciences, Institute of Software, Chinese Academy Sciences. His research interests include numerical
More informationAlexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs
Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs USA November 14 th 2017 Legal Disclaimer & Optimization
More informationAnalysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P
Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P Stephen Wang 1, James Lin 1, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See
More informationRefactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer
Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer Haohuan Fu 1,2, Junfeng Liao 1,2,3, Wei Xue 1,2,3, Lanning Wang 4, Dexun Chen 3, Long Gu 5, Jinxiu
More informationImplicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC
Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma
More informationSupercomputer Networking via IPv6. Xing Li, Congxiao Bao, Wusheng Zhang
Supercomputer Networking via IPv6 Xing Li, Congxiao Bao, Wusheng Zhang 2018-08-08 Outline Sunway TaihuLight Supercomputer CERNET and CERNET2 Challenges Solution Performance Q&A Remarks 2 National Supercomputing
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationReport on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory
Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 24, 2016 University of Tennessee Department of Electrical Engineering and Computer Science
More informationAchieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor
Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway Many-core Processor Stephen Wang 1, James Lin 1,4, William Tang 2, Stephane Ethier 2, Bei Wang 2, Simon See 1,3
More informationswsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu
swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture 1,3 2 1,3 1,3 Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu 1 2 3 Outline 1. Background 2. Sunway architecture
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationTowards Exascale Computing with the Atmospheric Model NUMA
Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey
More informationSystem Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited
System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited 2018 IEEE 68th Electronic Components and Technology Conference San Diego, California May 29
More informationParallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationUnderstanding IO patterns of SSDs
固态硬盘 I/O 特性测试 周大 众所周知, 固态硬盘是一种由闪存作为存储介质的数据库存储设备 由于闪存和磁盘之间物理特性的巨大差异, 现有的各种软件系统无法直接使用闪存芯片 为了提供对现有软件系统的支持, 往往在闪存之上添加一个闪存转换层来实现此目的 固态硬盘就是在闪存上附加了闪存转换层从而提供和磁盘相同的访问接口的存储设备 一方面, 闪存本身具有独特的访问特性 另外一方面, 闪存转换层内置大量的算法来实现闪存和磁盘访问接口之间的转换
More informationIHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing
: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationA Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC
A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC Hisashi YASHIRO RIKEN Advanced Institute of Computational Science Kobe, Japan My topic The study for Cloud computing My topic
More informationAnalysis and Visualization Algorithms in VMD
1 Analysis and Visualization Algorithms in VMD David Hardy Research/~dhardy/ NAIS: State-of-the-Art Algorithms for Molecular Dynamics (Presenting the work of John Stone.) VMD Visual Molecular Dynamics
More informationAnalysis of Characteristic Features of HPC Applications. Lian Jin HPC Engineer, Inspur
Analysis of Characteristic Features of HPC Applications Lian Jin HPC Engineer, Inspur HPC applications How to make these applications much more efficient? Application optimization System optimization Understanding
More informationFujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited
Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,
More informationGPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem
GPU-optimized computational speed-up for the atmospheric chemistry box model from CAM4-Chem Presenter: Jian Sun Advisor: Joshua S. Fu Collaborator: John B. Drake, Qingzhao Zhu, Azzam Haidar, Mark Gates,
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationGPU Consideration for Next Generation Weather (and Climate) Simulations
GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationEvolution of Transparent Computing with UEFI - Intel s Practice on TC. Lu Ju Director, Platform Software Infrastructure, Intel Asia Pacific R&D Ltd
Evolution of Transparent Computing with UEFI - Intel s Practice on TC Lu Ju Director, Platform Software Infrastructure, Intel Asia Pacific R&D Ltd Agenda TC - the Retrospective UEFI - the Review & Update
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationAlexander Heinecke. Parallel Computing Lab Intel Labs USA
Alexander Heinecke Parallel Computing Lab Intel Labs USA April 23 th 2018 Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
More informationIntroduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration
Exascale Applications and Software Conference 21st 23rd April 2015, Edinburgh, UK Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration Xue-Feng
More informationExperiences of the Development of the Supercomputers
Experiences of the Development of the Supercomputers - Earth Simulator and K Computer YOKOKAWA, Mitsuo Kobe University/RIKEN AICS Application Oriented Systems Developed in Japan No.1 systems in TOP500
More informationBigger GPUs and Bigger Nodes. Carl Pearson PhD Candidate, advised by Professor Wen-Mei Hwu
Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by Professor Wen-Mei Hwu 1 Outline Experiences from working with domain experts to develop GPU codes on Blue Waters
More informationLCOC Lustre Cache on Client
1 LCOC Lustre Cache on Client Xue Wei The National Supercomputing Center, Wuxi, China Li Xi DDN Storage 2 NSCC-Wuxi and the Sunway Machine Family Sunway-I: - CMA service, 1998 - commercial chip - 0.384
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationChina Next Generation Internet (CNGI) project and its impact. MA Yan Beijing University of Posts and Telecommunications 2009/08/06.
China Next Generation Internet (CNGI) project and its impact MA Yan Beijing University of Posts and Telecommunications 2009/08/06 Outline Next Generation Internet CNGI project in general CNGI-CERNET2 CERNET2
More information云计算入门 Introduction to Cloud Computing GESC1001
Lecture #6 云计算入门 Introduction to Cloud Computing GESC1001 Philippe Fournier-Viger Professor School of Humanities and Social Sciences philfv8@yahoo.com Fall 2017 1 Introduction Last week: how cloud applications
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationPerformance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA
Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,
More informationGPU Developments for the NEMO Model. Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA
GPU Developments for the NEMO Model Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA NVIDIA HPC AND ESM UPDATE TOPICS OF DISCUSSION GPU PROGRESS ON NEMO MODEL 2 NVIDIA GPU
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationFUJITSU HPC and the Development of the Post-K Supercomputer
FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationKey Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED
Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationIntroduction of Oakforest-PACS
Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS
More informationTestbed Overview in China Xiaohong Huang Beijing University of Posts and Telecommunications
Testbed Overview in China Xiaohong Huang Beijing University of Posts and Telecommunications CJK FI Workshop, Nov. 2011 Agenda Why we need testbeds PlanetLab, Emulab, OpenFlow, Seattle, ORBIT,... Idea Standardization
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationCS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*
CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 2014 COMPUTERS : PRESENT, PAST & FUTURE Top 10 Supercomputers in the World as of November 2013* No Site Computer Cores Rmax + (TFLOPS) Rpeak (TFLOPS)
More informationMemory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25
ROME Workshop @ IPDPS Vancouver Memory Footprint of Locality Information On Many- Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25 Locality Matters to HPC Applications Locality Matters
More information: Operating System 计算机原理与设计
0117401: Operating System 计算机原理与设计 Chapter 1-2: CS Structure 陈香兰 xlanchen@ustceducn http://staffustceducn/~xlanchen Computer Application Laboratory, CS, USTC @ Hefei Embedded System Laboratory, CS, USTC
More informationToward Automated Application Profiling on Cray Systems
Toward Automated Application Profiling on Cray Systems Charlene Yang, Brian Friesen, Thorsten Kurth, Brandon Cook NERSC at LBNL Samuel Williams CRD at LBNL I have a dream.. M.L.K. Collect performance data:
More informationHigh-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs
High-Order Finite-Element Earthquake Modeling on very Large Clusters of CPUs or GPUs Gordon Erlebacher Department of Scientific Computing Sept. 28, 2012 with Dimitri Komatitsch (Pau,France) David Michea
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationGPU Developments in Atmospheric Sciences
GPU Developments in Atmospheric Sciences Stan Posey, HPC Program Manager, ESM Domain, NVIDIA (HQ), Santa Clara, CA, USA David Hall, Ph.D., Sr. Solutions Architect, NVIDIA, Boulder, CO, USA NVIDIA HPC UPDATE
More informationINSPUR and HPC Innovation
INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community
More informationLecture 1: Introduction and Computational Thinking
PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationParallel Programming Futures: What We Have and Will Have Will Not Be Enough
Parallel Programming Futures: What We Have and Will Have Will Not Be Enough Michael Heroux SOS 22 March 27, 2018 Sandia National Laboratories is a multimission laboratory managed and operated by National
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationAccelerated Earthquake Simulations
Accelerated Earthquake Simulations Alex Breuer Technische Universität München Germany 1 Acknowledgements Volkswagen Stiftung Project ASCETE: Advanced Simulation of Coupled Earthquake-Tsunami Events Bavarian
More informationAvalonMiner Raspberry Pi Configuration Guide. AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide
AvalonMiner 树莓派配置教程 AvalonMiner Raspberry Pi Configuration Guide 简介 我们通过使用烧录有 AvalonMiner 设备管理程序的树莓派作为控制器 使 用户能够通过控制器中管理程序的图形界面 来同时对多台 AvalonMiner 6.0 或 AvalonMiner 6.01 进行管理和调试 本教程将简要的说明 如何把 AvalonMiner
More informationIntroduction to the K computer
Introduction to the K computer Fumiyoshi Shoji Deputy Director Operations and Computer Technologies Div. Advanced Institute for Computational Science RIKEN Outline ü Overview of the K
More informationRunning the FIM and NIM Weather Models on GPUs
Running the FIM and NIM Weather Models on GPUs Mark Govett Tom Henderson, Jacques Middlecoff, Jim Rosinski, Paul Madden NOAA Earth System Research Laboratory Global Models 0 to 14 days 10 to 30 KM resolution
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationMultiprotocol Label Switching The future of IP Backbone Technology
Multiprotocol Label Switching The future of IP Backbone Technology Computer Network Architecture For Postgraduates Chen Zhenxiang School of Information Science and Technology. University of Jinan (c) Chen
More information赛灵思技术日 XILINX TECHNOLOGY DAY 用赛灵思 FPGA 加速机器学习推断 张帆资深全球 AI 方案技术专家
赛灵思技术日 XILINX TECHNOLOGY DAY 用赛灵思 FPGA 加速机器学习推断 张帆资深全球 AI 方案技术专家 2019.03.19 Who is Xilinx? Why Should I choose FPGA? Only HW/SW configurable device 1 2 for fast changing networks High performance / low
More informationPetascale Computing Research Challenges
Petascale Computing Research Challenges - A Manycore Perspective Stephen Pawlowski Intel Senior Fellow GM, Architecture & Planning CTO, Digital Enterprise Group Yesterday, Today and Tomorrow in HPC ENIAC
More informationMachine Vision Market Analysis of 2015 Isabel Yang
Machine Vision Market Analysis of 2015 Isabel Yang CHINA Machine Vision Union Content 1 1.Machine Vision Market Analysis of 2015 Revenue of Machine Vision Industry in China 4,000 3,500 2012-2015 (Unit:
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationThe Exascale Era Has Arrived
Technology Spotlight The Exascale Era Has Arrived Sponsored by NVIDIA Steve Conway, Earl Joseph, Bob Sorensen, and Alex Norton November 2018 EXECUTIVE SUMMARY Earlier this year, scientists broke the exascale
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More information信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007 Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 1 Last week We have discussed about:
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationPost-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED
Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit
More informationSTRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC. Stefan Maintz, Dr. Markus Wetzstein
STRATEGIES TO ACCELERATE VASP WITH GPUS USING OPENACC Stefan Maintz, Dr. Markus Wetzstein smaintz@nvidia.com; mwetzstein@nvidia.com Companies Academia VASP USERS AND USAGE 12-25% of CPU cycles @ supercomputing
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More information3D ADI Method for Fluid Simulation on Multiple GPUs. Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA
3D ADI Method for Fluid Simulation on Multiple GPUs Nikolai Sakharnykh, NVIDIA Nikolay Markovskiy, NVIDIA Introduction Fluid simulation using direct numerical methods Gives the most accurate result Requires
More informationMulti-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation
Multi-GPU Scaling of Direct Sparse Linear System Solver for Finite-Difference Frequency-Domain Photonic Simulation 1 Cheng-Han Du* I-Hsin Chung** Weichung Wang* * I n s t i t u t e o f A p p l i e d M
More informationComputer Networks. Wenzhong Li. Nanjing University
Computer Networks Wenzhong Li Nanjing University 1 Chapter 3. Packet Switching Networks Network Layer Functions Virtual Circuit and Datagram Networks ATM and Cell Switching X.25 and Frame Relay Routing
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationTechnologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017
Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationEfficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning
Efficient and Scalable Multi-Source Streaming Broadcast on Clusters for Deep Learning Ching-Hsiang Chu 1, Xiaoyi Lu 1, Ammar A. Awan 1, Hari Subramoni 1, Jahanzeb Hashmi 1, Bracy Elton 2 and Dhabaleswar
More informationSimulating Stencil-based Application on Future Xeon-Phi Processor
Simulating Stencil-based Application on Future Xeon-Phi Processor PMBS workshop at SC 15 Chitra Natarajan Carl Beckmann Anthony Nguyen Intel Corporation Intel Corporation Intel Corporation Mauricio Araya-Polo
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationINSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM
INSPUR and HPC Innovation Dong Qi (Forrest) Oversea PM dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community Inspur
More informationTianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer
Tianhe-2, the world s fastest supercomputer Shaohua Wu Senior HPC application development engineer Inspur Inspur revenue 5.8 2010-2013 6.4 2011 2012 Unit: billion$ 8.8 2013 21% Staff: 14, 000+ 12% 10%
More information如何查看 Cache Engine 缓存中有哪些网站 /URL
如何查看 Cache Engine 缓存中有哪些网站 /URL 目录 简介 硬件与软件版本 处理日志 验证配置 相关信息 简介 本文解释如何设置处理日志记录什么网站 /URL 在 Cache Engine 被缓存 硬件与软件版本 使用这些硬件和软件版本, 此配置开发并且测试了 : Hardware:Cisco 缓存引擎 500 系列和 73xx 软件 :Cisco Cache 软件版本 2.3.0
More informationECE 574 Cluster Computing Lecture 2
ECE 574 Cluster Computing Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 24 January 2019 Announcements Put your name on HW#1 before turning in! 1 Top500 List November
More informationThe Future of the Message-Passing Interface. William Gropp
The Future of the Message-Passing Interface William Gropp www.cs.illinois.edu/~wgropp MPI and Supercomputing The Message Passing Interface (MPI) has been amazingly successful First released in 1992, it
More information