Moving Towards Exascale with Lessons Learned from GPU Computing. Wen-mei Hwu ECE, CS, PCI, NCSA University of Illinois at Urbana-Champaign
|
|
- Muriel Murphy
- 5 years ago
- Views:
Transcription
1 Moving Towards Exascale with Lessons Learned from GPU Computing Wen-mei Hwu ECE, CS, PCI, NCSA University of Illinois at Urana-Champaign
2 Agenda Blue Waters and recent progress in petascale GPU computing Upcoming innovations in exascale computing Architecture Programming Systems Conclusion and discussions
3 Blue Waters - Operational at Illinois since 3/ PF 1.6 PB DRAM $25M 12+ G/sec 1/4/1 G Ethernet Switch 1 GB/sec IB Switch >1 TB/sec WAN Spectra Logic: 3 PBs Sonexion: 26 PBs
4 Two Petascale Computing Systems NCSA ORNL System Attriute Blue Waters Titan Vendors Cray/AMD/NVIDIA Cray/AMD/NVIDIA Processors Interlagos/Kepler Interlagos/Kepler Total Peak Performance (PF) Total Peak Performance (CPU/GPU) 7.7/ /24.5 Numer of CPU Chips 49,54 18,688 Numer of GPU Chips 4,224 18,688 Amount of CPU Memory (TB) Interconnect 3D Torus 3D Torus Amount of On-line Disk Storage (PB) Sustained Disk Transfer (TB/sec) > Amount of Archival Storage Sustained Tape Transfer (GB/sec) 1 7
5 Blue Waters contains 4,224 Cray XK7 compute nodes. Cray XK7 Nodes Dual-socket Node One AMD Interlagos chip 8 core modules, 32 threads GFs peak performance 32 GBs memory 51 GB/s andwidth One NVIDIA Kepler chip 1.3 TFs peak performance 6 GBs GDDR5 memory 25 GB/sec andwidth Gemini Interconnect Same as XE6 nodes
6 Cray XE6 Nodes Blue Waters contains 22,64 Cray XE6 compute nodes. Dual-socket Node Two AMD Interlagos chips 16 core modules, 64 threads 313 GFs peak performance 64 GBs memory 12 GB/sec memory andwidth Gemini Interconnect Router chip & network interface Injection Bandwidth (peak) 9.6 GB/sec per direction
7 Science Area Numer of Teams Codes Struct Grids Unstruct Grids Dense Matrix Sparse Matrix N- Body Monte Carlo FFT PIC Sig I/O Climate and Weather 3 CESM, GCRM, CM1/WRF, HOMME X X X X X Plasmas/ Magnetosphere 2 H3D(M),VPIC, OSIRIS, Magtail/UPIC X X X X Stellar Atmospheres and Supernovae 5 PPM, MAESTRO, CASTRO, SEDONA, ChaNGa, MS-FLUKSS X X X X X X Cosmology 2 Enzo, pgadget X X X Comustion/ Turulence 2 PSDNS, DISTUF X X General Relativity 2 Cactus, Harm3D, LazEV X X Molecular Dynamics 4 AMBER, Gromacs, NAMD, LAMMPS X X X Quantum Chemistry 2 SIAL, GAMESS, NWChem X X X X X Material Science 3 NEMOS, OMEN, GW, QMCPACK Earthquakes/ Seismology 2 AWP-ODC, HERCULES, PLSQR, SPECFEM3D X X X X X X X X Quantum Chromo Dynamics 1 Chroma, MILC, USQCD X X X Social Networks 1 EPISIMDEMICS Evolution 1 Eve Engineering/System of Systems 1 GRIPS,Revisit X Computer Science 1 X X X X X
8 NAMD Initial Production Use Results 1 million atom enchmark with Langevin dynamics and PME once every 4 steps, from launch to finish, all I/O included 768 nodes, Kepler+Interlagos is 3.9X faster over Interlagos-only 768 nodes, XK7 is 1.8X XE6 Chroma Lattice QCD parameters: grid size of 48 3 x 512 running at the physical values of the quark masses 768 nodes, Kepler+Interlagos is 4.9X faster over Interlagos-only 768 nodes, XK7 is 2.4X XE6 QMCPACK Full run Graphite 4x4x1 (256 electrons), QMC followed y VMC 7 nodes, Kepler+Interlagos is 4.9X faster over Interlagos-only 7 nodes, XK7 is 2.7X XE6
9 Blue Waters Science Breakthrough Example Determination of the structure of the HIV capsid at atomic-level Collaorative effort of experimental groups, at the U. of Pittsurgh and Vanderilt U., and the Schulten s computational team at the U. of Illinois. 64-million-atom HIV capsid simulation of the process through which the capsid disassemles, releasing its genetic material, is a critical step in HIV infection and a potential target for antiviral drugs.
10 LESSON #1 Production GPU code often involve complex tradeoffs. 1
11 Tridiagonal Solver Implicit finite difference methods, cuic spline interpolation, pre-conditioners An algorithm to find a solution of Ax = d, where A is an n- y-n tridiagonal matrix and d is an n-element vector
12 A Efficient Storage Format for A a 1 c a 1 2 c c 1 c a c c 2 3 Stored as 3 1D arrays No need for column indices No need for row pointers 1 2 3
13 GPU Tridiagonal System Solver Building Blocks Hyrid Methods PCR-Thomas (Kim 211, Davidson 211) PCR-CR (CUSPARSE 212) Etc. CUSPARSE is supported y NVIDIA Thomas (sequential) Cyclic Reduction (1 step) PCR (1 step) a c a c a c e e e e a c a a c a c e e e e a c a c a c e e e e a c a c a a c c e e e e a c a c a c e e e e Interleaved layout after partitioni
14 GPU Performance Advantage Runtime of solving an 8M-row matrix (ms) Our SPIKE-diag_pivoting (GPU) Our SPIKE-Thomas (GPU) Random Diagonally dominant CUSPARSE (GPU) Data transfer (pageale) Data transfer (pinned) MKL dgtsv(sequential, CPU) SAMOS 213
15 Relative Backward Error Matrix type SPIKE-diag_pivoting SPIKE-Thomas CUSPARSE MKL Intel SPIKE Matla E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-4 1.5E E E E E-5 3.9E E E-5 9.7E E E E E E E E E E E E-4 2.2E E E E E E E E E E E E E E E E E E E E E+6 Nan Nan 1.47E+59 Fail 3.69E Inf Nan Nan Inf Fail 4.68E+171 SA MO S
16 Numerically Stale Partitioning SPIKE (Polizzi et al), diagonal pivoting for numerical staility SX = Y (5) DY = F (6) AiYi = Fi (7) SAMOS 213
17 Fast Transposition on GPU Chang, et al, SC 212 SAMOS 213
18 Dynamic Tiling Runtime (ms) data layout only dynamic tiling (with data layout) x random diagonally dominant zero diagonal SAMOS 213
19 Chang, et al, SC 212 SAMOS 213 Relative Backward Error Matrix type SPIKE-diag_pivoting SPIKE-Thomas CUSPARSE MKL Intel SPIKE Matla E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E-4 1.5E E E E E-5 3.9E E E-5 9.7E E E E E E E E E E E E-4 2.2E E E E E E E E E E E E E E E E E E E E E+6 Nan Nan 1.47E+59 Fail 3.69E Inf Nan Nan Inf Fail 4.68E+171
20 Chang, et al, SC 212 SAMOS 213 GPU Performance Advantage Runtime of solving an 8M matrix (ms) Our SPIKE-diag_pivoting (GPU) Our SPIKE-Thomas (GPU) Random Diagonally dominant CUSPARSE (GPU) Data transfer (pageale) Data transfer (pinned) MKL dgtsv(sequential, CPU)
21 Some Important Trends Driven y moile market Computing devices are quickly ecome SOCs Integrated CPU-GPU-I/O with DRAM Channels Compute outside CPU cores gaining importance Data movement off-chip must e minimized CUDA and HSA (Heterogeneous System Architecture) are leading the trends New OS/runtime and programming systems are emerging to take advantage of these advances 21
22 NEW INNOVATIONS IN EXASCALE COMPUTING
23 UNIFIED ADDRESS SPACE AND SOC DISTRIBUTED COMPUTE for acceleration of at scale data communication and compute - Innovation in HSA and OS
24 State of the Art Data Transfer Behavior Main Memory (DRAM) CPU PCIe Network I/O with Compression engine Disk I/O DMA Device Memory GPU card (or other Accelerator cards)
25 Desired SoC Data Transfer Behavior Main Memory (DRAM) SOC LLC DMA Network I/O with compression engine CPU/GPU Disk I/O
26 UNIFIED COHERENT MEMORY for acceleration of pointer-ased data structures - Innovation in CUDA and HSA
27 Large 3D spatial data structure Legacy Access to Large Structures Legacy SYSTEM MEMORY GPU GPU MEMORY GPU Memory is smaller Have to copy and process in chunks
28 Large 3D spatial data structure Legacy Access to Large Structures Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 GPU MEMORY
29 Large 3D spatial data structure COPY One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL GPU MEMORY Copy of top 2 levels of hierarchy
30 Process One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 FIRST KERNEL GPU MEMORY
31 Process One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 FIRST KERNEL GPU MEMORY
32 Process One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 FIRST KERNEL GPU MEMORY
33 Copy One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL GPU MEMORY Copy of ottom 3 levels of one ranch of the hierarchy
34 Process One Chunk at a Time Legacy SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL SECOND KERNEL GPU MEMORY
35 Large 3D spatial data structure Large Spatial Data structure HSA and full OpenCL 2. SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL
36 GPU can Traverse entire hierarchy HSA SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL
37 GPU can Traverse entire hierarchy HSA SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL
38 GPU can Traverse entire hierarchy HSA SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL
39 GPU can Traverse entire hierarchy HSA SYSTEM MEMORY l 1 l 2 l 3 l 4 GPU l 5 KERNEL
40 LIBRARY-DRIVEN COMPILER OPTIMIZATION Enaling high-performance programming using high-level languages - Innovation in Triolet and Tangram Blue Waters SETAC Meeting - Fe 214 4
41 Current State of Programming Heterogeneous Systems CPU Multicore Xeon Phi GPU FPGA SoC for HPC
42 Current State of Programming Heterogeneous Systems C/FORTRA N CPU Multicore Xeon Phi GPU FPGA SoC for HPC
43 Current State of Programming Heterogeneous Systems C/FORTRAN + Directives (OpenMP/ TBB) CPU Multicore Xeon Phi GPU FPGA SoC for HPC
44 Current State of Programming Heterogeneous Systems C/FORTRAN + Directives (OpenMP/ TBB) + SIMD Intrinsics CPU Multicore Xeon Phi GPU FPGA SoC for HPC
45 Current State of Programming Heterogeneous Systems C/FORTRAN OpenCL + Directives (OpenMP/ TBB) + SIMD Intrinsics CPU Multicore Xeon Phi GPU FPGA SoC for HPC
46 Current State of Programming Heterogeneous Systems C/FORTRAN OpenCL + Directives (OpenMP/ TBB) + SIMD Intrinsics Verilog/VHDL CPU Multicore Xeon Phi GPU FPGA SoC for HPC
47 Programming heterogeneous systems requires too many versions of code! C/FORTRAN OpenCL + Directives (OpenMP/ TBB) + SIMD Intrinsics Verilog/VHDL CPU Multicore Xeon Phi GPU FPGA SoC for HPC
48 Triolet Triolet Compiler C/FORTRAN OpenCL + Directives (OpenMP/ TBB) + SIMD Intrinsics Verilog/VHDL CPU Multicore Xeon Phi GPU FPGA SoC for HPC
49 Programming in Triolet Python Nonuniform FT (real part) SoC for HPC
50 Programming in Triolet Python Nonuniform FT (real part) Inner loop Inner loop ys = [ sum(x * cos(r*k) for (x, k) in zip(xs, ks)) for r in par(rs)] SoC for HPC
51 Programming in Triolet Python Nonuniform FT (real part) Inner loop Outer loop Inner loop Outer loop ys = [ sum(x * cos(r*k) for (x, k) in zip(xs, ks)) for r in par(rs)] SoC for HPC
52 Programming in Triolet Python Nonuniform FT (real part) Inner loop Outer loop Inner loop Outer loop ys = [ sum(x * cos(r*k) for (x, k) in zip(xs, ks)) for r in par(rs)] map and reduce style programming no new paradigm to learn Parallel details are implicit easy to use Automated data partitioning, MPI rank generation, MPI messaging, OpenMP, etc. SoC for HPC
53 Lirary-Driven Optimization The asic idea: Compilers are smart (inlining, unoxing,...) Software astractions have no overhead when the compiler can evaluate them statically Just write components as lirary code! Emody optimizations in the lirary code as alternative function implementations, use auto-tuning to select a final arrangement Triolet uses this approach [Rodrigues, et al 14] Prior work demonstrates loop fusion, shared-memory parallelism, vectorization [Coutts et al. 7, Keller et al. 1, Mainland et al. 13]
54 Parallel Loop Fusion Based on Keller et al. 1 sum(map(square, range(k))) range(k) There are k outputs The n th output is n map(square,...) sum(...) Loop over inputs The n th output is square(n th input) Add the n th input to result Fusion There are k iterations In each iteration, add square(n) to result
55 Triolet Productivity and Performance ys = [sum(x * cos(r*k) for (x, k) in zip(xs, ks)) for r in par(rs)] Lirary functions factor out data decomposition, parallelism, and communication 128-way Speedup (16 cores 8 nodes) Triolet C with MPI+OpenMP C with MPI+OpenM!!"#$!%&"'#&()*+,!%&"'"-.!!/1'2)%%'+"345/1'26//'7689:,!;%&"'#&()*+<.!!/1'2)%%'(=#>5/1'26//'7689:,!;%&"'"-<.!!*)#+$!"#$!(=#>?!@!%&"'"-!@@!?.!!/1'A*=+$5;+"34'B,!C,!/1'1DE,!?,!/1'26//'7689:<.!!/1'A*=+$5;+"34'>,!C,!/1'1DE,!?,!/1'26//'7689:<.!!*)#+$!"#$!*FG#>'+"34'B!@!*4"H-"I5+"34'B,!%&"'#&()*+<.!!*)#+$!"#$!&=--4-'+"34'B!@!*FG#>'+"34'B!J!%&"'#&()*+.!!KH)=$!J>+,!JB+.!!"K!5(=#>?<!L!!!!>+!@!"#&G$'>+.!!!!B+!@!"#&G$'B+.!!M!!4H+4!L!!!!>+!@!%=HH)*5+"34'>!J!+"34)K5KH)=$<<.!!!!B+!@!%=HH)*5+"34'>!J!+"34)K5KH)=$<<.!!M!!KH)=$!J(+'*FG#>!@!%=HH)*5*FG#>'+"34'B!J!+"34)K5KH)=$<<.!!KH)=$!JN+'*FG#>!@!%=HH)*5*FG#>'+"34'B!J!+"34)K5KH)=$<<.!!"K!5(=#>?<!L!!!!"#$!#O)(>4(+!@!%&"'#&()*+PC.!!!!/1'84QG4+$!J(4Q+!@!%=HH)*5#O)(>4(+!J!R!J!+"34)K5/1'84QG4+$<<.!!!!"#$!O.!!!!K)(!5O!@!?.!O!S!#O)(>4(+.!OTT<!L!!!!!!"#$!O)(>4('"-!@!OTC.!!!!!!/1'1+4#-5>+,!+"34'>,!/1'U96VE,!O)(>4('"-,!!!!!!!!!!!!!!!!?,!/1'26//'7689:,!;(4Q+WOX<.!!!!!!/1'1+4#-5B+,!+"34'>,!/1'U96VE,!O)(>4('"-,!!!!!!!!!!!!!!!!?,!/1'26//'7689:,!;(4Q+W#O)(>4(+TOX<.!!!!!!/1'1+4#-5(+!T!O)(>4('"-J*FG#>'+"34'B,!*FG#>'+"34'B,!/1'U96VE,!O)(>4('"-,!!!!!!!!!!!!!!!!?,!/1'26//'7689:,!;(4Q+WYJ#O)(>4(+TOX<.!!!!M!!!!%4%*&N5(+'*FG#>,!(+,!*FG#>'+"34'B!J!+"34)K5KH)=$<<.!!!!/1'7="$=HH5#O)(>4(+JR,!(4Q+,!/1'ZEVE[Z\Z'1]D68\<.!!!!K(445(4Q+<.!!M!!4H+4!L!!!!/1'84*I5>+,!+"34'>,!/1'U96VE,!?,!!!!!!!!!!!!!?,!/1'26//'7689:,!/1'ZEVE[Z'1]D68\<.!!!!/1'84*I5B+,!+"34'>,!/1'U96VE,!?,!!!!!!!!!!!!!?,!/1'26//'7689:,!/1'ZEVE[Z'1]D68\<.!!!!/1'84*I5(+'*FG#>,!*FG#>'+"34'B,!/1'U96VE,!?,!!!!!!!!!!!!!?,!/1'26//'7689:,!/1'ZEVE[Z'1]D68\<.!!M!!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^!!L!!!!"#$!". _&(=`%=!)%&!&=(=HH4H!K)(!+*F4-GH45+$=$"*<!!!!K)(!5"!@!?.!"!S!*FG#>'+"34'B.!"TT<!L!!!!!!KH)=$!+!@!?.!!!!!!"#$!a.!!!!!!K)(!5a!@!?.!a!S!+"34'>.!aTT<!!!!!!!!+!T@!B+WaX!J!*)+K5(+'*FG#>W"X!J!>+WaX<.!!!!!!N+'*FG#>W"X!@!+.!!!!M!!M!!^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^!!/1']=$F4(5N+'*FG#>,!*FG#>'+"34'B,!/1'U96VE,!N+,!*FG#>'+"34'B,!/1'U96VE,!!!!!!!!!!!!!?,!/1'26//'7689:<.!!K(445(+'*FG#><.!!K(445N+'*FG#><.!!"K!5(=#>?<!L!!!!K(445(+<.!!M!!4H+4!L!!!!K(445B+<.!!!!K(445>+<.!!M Setup Cleanup SoC for HPC
56 Conclusion Blue Waters and related HPC systems have demonstrated that GPU computing can have narrow ut deep impact on science New architectural innovations enale much easier work migration etween CPUs and accelerators New programming systems innovations enale much faster application development and lowered maintenance cost 56
57 IWC SE 21 Acknowledgements D. August (Princeton), S. Baghsorkhi (Illinois), N. Bell (NVIDIA), D. Callahan (Microsoft), J. Cohen (NVIDIA), B. Dally (Stanford), J. Demmel (Berkeley), P. Duey (Intel), M. Frank (Intel), M. Garland (NVIDIA), Isaac Gelado (BSC), M. Gschwind (IBM), R. Hank (Google), J. Hennessy (Stanford), P. Hanrahan (Stanford), M. Houston (AMD), T. Huang (Illinois), D. Kaeli (NEU), K. Keutzer (Berkeley), W. Kramer (NCSA), I. Gelado (NVIDIA), B. Gropp (Illinois), D. Kirk (NVIDIA), D. Kuck (Intel), S. Mahlke (Michigan), T. Mattson (Intel), N. Navarro (UPC), J. Owens (Davis), D. Padua (Illinois), S. Patel (Illinois), Y. Patt (Texas), D. Patterson (Berkeley), C. Rodrigues (Illinois), P. Rogers (AMD), S. Ryoo (ZeroSoft), B. Sander (AMD), K. Schulten (Illinois), B. Smith (Microsoft), M. Snir (Illinois), I. Sung (Illinois), P. Stenstrom (Chalmers), J. Stone (Illinois), S. Stone (Harvard) J. Stratton (Illinois), H. Takizawa (Tohoku), M. Valero (UPC) And many others!
Rethinking Computer Architecture for Energy Limited Computing
Rethinking Computer Architecture for Energy Limited Computing Wen-mei Hwu ECE, CS, PCI, NCSA University of Illinois at Urana-Champaign Agenda Blue Waters and recent progress in petascale GPU computing
More informationRethinking Computer Architecture for Throughput Computing
Rethinking Computer Architecture for Throughput Computing Wen-mei Hwu University of Illinois, Urbana-Champaign MulticoreWare, Inc. A Few Thoughts on Computer Architecture While in Samos Where are we today?
More informationScalability, Portability, and Productivity in GPU Computing
Scalability, Portability, and Productivity in GPU Computing Wen-mei Hwu Sanders AMD Chair, ECE and CS University of Illinois, Urbana-Champaign CTO, MulticoreWare Agenda Performance experience in using
More informationScalability, Portability, and Productivity in GPU Computing
Scalability, Portability, and Productivity in GPU Computing Wen-mei Hwu Sanders AMD Chair, ECE and CS University of Illinois, Urbana-Champaign CTO, MulticoreWare Agenda 4,224 Kepler GPUs in Blue Waters
More informationWhy are GPUs so hard to program or are they? Wen-mei Hwu University of Illinois, Urbana-Champaign MulticoreWare
Why are GPUs so hard to program or are they? Wen-mei Hwu University of Illinois, Urbana-Champaign MulticoreWare Agenda GPU Computing in Blue Waters Library Algorithms Scalability, performance, and numerical
More informationAPPLICATION OF ACCELERATORS
APPLICAION OF ACCELERAORS AND PARALLEL COMPUING O SEQUENCING DAA 1 Wen-mei W. Hwu University of Illinois at Urbana-Champaign With Deming Chen and Jian Ma Blue Waters Computing System Fully Operational
More informationGPU Supercomputing From Blue Waters to Exascale
GPU Supercomputing From Blue Waters to Exascale Wen-mei Hwu Professor, University of Illinois at Urbana-Champaign (UIUC) CTO, MulticoreWare Inc. New BW Configuration Cray System & Storage cabinets: Compute
More informationA Productive Framework for Generating High Performance, Portable, Scalable Applications for Heterogeneous computing
A Productive Framework for Generating High Performance, Portable, Scalable Applications for Heterogeneous computing Wen-mei W. Hwu with Tom Jablin, Chris Rodrigues, Liwen Chang, Steven ShengZhou Wu, Abdul
More informationThe Next Generation of High Performance Computing. William Gropp
The Next Generation of High Performance Computing William Gropp www.cs.illinois.edu/~wgropp Extrapolation is Risky 1989 T 23 years Intel introduces 486DX Eugene Brooks writes Attack of the Killer Micros
More informationTriolet C++/Python productive programming in heterogeneous parallel systems
Triolet C++/Python productive programming in heterogeneous parallel systems Wen-mei Hwu Sanders AMD Chair, ECE and CS University of Illinois, Urbana-Champaign CTO, MulticoreWare Agenda Performance portability
More informationThe Next Generation of High Performance Computing. William Gropp
The Next Generation of High Performance Computing William Gropp www.cs.illinois.edu/~wgropp Extrapolation is Risky 1989 T 23 years Intel introduces 486DX Eugene Brooks writes Attack of the Killer Micros
More informationA Scalable, Numerically Stable, High- Performance Tridiagonal Solver for GPUs. Li-Wen Chang, Wen-mei Hwu University of Illinois
A Scalable, Numerically Stable, High- Performance Tridiagonal Solver for GPUs Li-Wen Chang, Wen-mei Hwu University of Illinois A Scalable, Numerically Stable, High- How to Build a gtsv for Performance
More informationAnalyses and Modeling of Applications Used to Demonstrate Sustained Petascale Performance on Blue Waters
Analyses and Modeling of Applications Used to Demonstrate Sustained Petascale Performance on Blue Waters Torsten Hoefler With lots of help from the AUS Team and Bill Kramer at NCSA! All used images belong
More informationRaising the Level of Many-core Programming with Compiler Technology
Raising the Level of Many-core Programming with Compiler Technology meeting a grand challenge Wen-mei Hwu University of Illinois, Urbana-Champaign Agenda Many-core Usage Pattern and Workflow Using Kernels
More informationWhat is driving heterogeneity in HPC?
What is driving heterogeneity in HPC? Wen-mei Hwu Professor and Sanders-AMD Chair, ECE, NCSA University of Illinois at Urbana-Champaign with Simon Garcia, and Carl Pearson 1 Agenda RevoluHonary paradigm
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationScaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies
Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationLecture 20: Distributed Memory Parallelism. William Gropp
Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order
More informationCUDA Experiences: Over-Optimization and Future HPC
CUDA Experiences: Over-Optimization and Future HPC Carl Pearson 1, Simon Garcia De Gonzalo 2 Ph.D. candidates, Electrical and Computer Engineering 1 / Computer Science 2, University of Illinois Urbana-Champaign
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationBigger GPUs and Bigger Nodes. Carl Pearson PhD Candidate, advised by Professor Wen-Mei Hwu
Bigger GPUs and Bigger Nodes Carl Pearson (pearson@illinois.edu) PhD Candidate, advised by Professor Wen-Mei Hwu 1 Outline Experiences from working with domain experts to develop GPU codes on Blue Waters
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationGPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester
NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationEarly Experiences Writing Performance Portable OpenMP 4 Codes
Early Experiences Writing Performance Portable OpenMP 4 Codes Verónica G. Vergara Larrea Wayne Joubert M. Graham Lopez Oscar Hernandez Oak Ridge National Laboratory Problem statement APU FPGA neuromorphic
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationChallenges in High Performance Computing. William Gropp
Challenges in High Performance Computing William Gropp www.cs.illinois.edu/~wgropp 2 What is HPC? High Performance Computing is the use of computing to solve challenging problems that require significant
More informationGPU Consideration for Next Generation Weather (and Climate) Simulations
GPU Consideration for Next Generation Weather (and Climate) Simulations Oliver Fuhrer 1, Tobias Gisy 2, Xavier Lapillonne 3, Will Sawyer 4, Ugo Varetto 4, Mauro Bianco 4, David Müller 2, and Thomas C.
More informationExperiences with ENZO on the Intel Many Integrated Core Architecture
Experiences with ENZO on the Intel Many Integrated Core Architecture Dr. Robert Harkness National Institute for Computational Sciences April 10th, 2012 Overview ENZO applications at petascale ENZO and
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationDebugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.
Debugging CUDA Applications with Allinea DDT Ian Lumb Sr. Systems Engineer, Allinea Software Inc. ilumb@allinea.com GTC 2013, San Jose, March 20, 2013 Embracing GPUs GPUs a rival to traditional processors
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationTOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT
TOOLS FOR IMPROVING CROSS-PLATFORM SOFTWARE DEVELOPMENT Eric Kelmelis 28 March 2018 OVERVIEW BACKGROUND Evolution of processing hardware CROSS-PLATFORM KERNEL DEVELOPMENT Write once, target multiple hardware
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 2018/19 A.J.Proença Data Parallelism 3 (GPU/CUDA, Neural Nets,...) (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2018/19 1 The
More informationPresent and Future Leadership Computers at OLCF
Present and Future Leadership Computers at OLCF Al Geist ORNL Corporate Fellow DOE Data/Viz PI Meeting January 13-15, 2015 Walnut Creek, CA ORNL is managed by UT-Battelle for the US Department of Energy
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationBlue Waters System Overview. Greg Bauer
Blue Waters System Overview Greg Bauer The Blue Waters EcoSystem Petascale EducaIon, Industry and Outreach Petascale ApplicaIons (CompuIng Resource AllocaIons) Petascale ApplicaIon CollaboraIon Team Support
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationBlue Waters Super System
Blue Waters Super System Michelle Butler 4/12/12 NCSA Has Completed a Grand Challenge In August, IBM terminated their contract to deliver the base Blue Waters system NSF asked NCSA to propose a change
More informationHPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017
Creating an Exascale Ecosystem for Science Presented to: HPC Saudi 2017 Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences March 14, 2017 ORNL is managed by UT-Battelle
More informationPERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015
PERFORMANCE PORTABILITY WITH OPENACC Jeff Larkin, NVIDIA, November 2015 TWO TYPES OF PORTABILITY FUNCTIONAL PORTABILITY PERFORMANCE PORTABILITY The ability for a single code to run anywhere. The ability
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationExascale Challenges and Applications Initiatives for Earth System Modeling
Exascale Challenges and Applications Initiatives for Earth System Modeling Workshop on Weather and Climate Prediction on Next Generation Supercomputers 22-25 October 2012 Tom Edwards tedwards@cray.com
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationAccelerating Molecular Modeling Applications with Graphics Processors
Accelerating Molecular Modeling Applications with Graphics Processors John Stone Theoretical and Computational Biophysics Group University of Illinois at Urbana-Champaign Research/gpu/ SIAM Conference
More informationPorting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms
Porting The Spectral Element Community Atmosphere Model (CAM-SE) To Hybrid GPU Platforms http://www.scidacreview.org/0902/images/esg13.jpg Matthew Norman Jeffrey Larkin Richard Archibald Valentine Anantharaj
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationTechnologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017
Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationAccelerating GPU computation through mixed-precision methods. Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University
Accelerating GPU computation through mixed-precision methods Michael Clark Harvard-Smithsonian Center for Astrophysics Harvard University Outline Motivation Truncated Precision using CUDA Solving Linear
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationGenerating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory
Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory Roshan Dathathri Thejas Ramashekar Chandan Reddy Uday Bondhugula Department of Computer Science and Automation
More informationFast Tridiagonal Solvers on GPU
Fast Tridiagonal Solvers on GPU Yao Zhang John Owens UC Davis Jonathan Cohen NVIDIA GPU Technology Conference 2009 Outline Introduction Algorithms Design algorithms for GPU architecture Performance Bottleneck-based
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationOptimising the Mantevo benchmark suite for multi- and many-core architectures
Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationNUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems Carl Pearson 1, I-Hsin Chung 2, Zehra Sura 2, Wen-Mei Hwu 1, and Jinjun Xiong 2 1 University of Illinois Urbana-Champaign, Urbana
More informationFaster, Cheaper, Better: Biomolecular Simulation with NAMD, VMD, and CUDA
Faster, Cheaper, Better: Biomolecular Simulation with NAMD, VMD, and CUDA John Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of Illinois
More informationINTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017
INTRODUCTION TO OPENACC Analyzing and Parallelizing with OpenACC, Feb 22, 2017 Objective: Enable you to to accelerate your applications with OpenACC. 2 Today s Objectives Understand what OpenACC is and
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationHPC future trends from a science perspective
HPC future trends from a science perspective Simon McIntosh-Smith University of Bristol HPC Research Group simonm@cs.bris.ac.uk 1 Business as usual? We've all got used to new machines being relatively
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationParticle-in-Cell Simulations on Modern Computing Platforms. Viktor K. Decyk and Tajendra V. Singh UCLA
Particle-in-Cell Simulations on Modern Computing Platforms Viktor K. Decyk and Tajendra V. Singh UCLA Outline of Presentation Abstraction of future computer hardware PIC on GPUs OpenCL and Cuda Fortran
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond Presented by: Pavel Shamis / Pasha ORNL is managed by UT-Battelle for the US Department of Energy Co-Design Collaboration The Next Generation
More informationMAGMA: a New Generation
1.3 MAGMA: a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Jack Dongarra T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki University of Tennessee, Knoxville Release
More informationThe Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System
The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System Alan Humphrey, Qingyu Meng, Martin Berzins Scientific Computing and Imaging Institute & University of Utah I. Uintah Overview
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationExploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen
Exploring Emerging Technologies in the Extreme Scale HPC Co- Design Space with Aspen Jeffrey S. Vetter SPPEXA Symposium Munich 26 Jan 2016 ORNL is managed by UT-Battelle for the US Department of Energy
More informationAnalysis and Visualization Algorithms in VMD
1 Analysis and Visualization Algorithms in VMD David Hardy Research/~dhardy/ NAIS: State-of-the-Art Algorithms for Molecular Dynamics (Presenting the work of John Stone.) VMD Visual Molecular Dynamics
More informationTACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing
TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing Jay Boisseau, Director April 17, 2012 TACC Vision & Strategy Provide the most powerful, capable computing technologies and
More informationExecution Models for the Exascale Era
Execution Models for the Exascale Era Nicholas J. Wright Advanced Technology Group, NERSC/LBNL njwright@lbl.gov Programming weather, climate, and earth- system models on heterogeneous muli- core plajorms
More informationIn-Situ Visualization and Analysis of Petascale Molecular Dynamics Simulations with VMD
In-Situ Visualization and Analysis of Petascale Molecular Dynamics Simulations with VMD John Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University
More informationGPU computing at RZG overview & some early performance results. Markus Rampp
GPU computing at RZG overview & some early performance results Markus Rampp Introduction Outline Hydra configuration overview GPU software environment Benchmarking and porting activities Team Renate Dohmen
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationAmazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud
Amazon Web Services: Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Summarized by: Michael Riera 9/17/2011 University of Central Florida CDA5532 Agenda
More informationHow HPC Hardware and Software are Evolving Towards Exascale
How HPC Hardware and Software are Evolving Towards Exascale Kathy Yelick Associate Laboratory Director and NERSC Director Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview
More informationTESLA V100 PERFORMANCE GUIDE May 2018
TESLA V100 PERFORMANCE GUIDE May 2018 TESLA V100 The Fastest and Most Productive GPU for AI and HPC Volta Architecture Tensor Core Improved NVLink & HBM2 Volta MPS Improved SIMT Model Most Productive GPU
More informationHPC Issues for DFT Calculations. Adrian Jackson EPCC
HC Issues for DFT Calculations Adrian Jackson ECC Scientific Simulation Simulation fast becoming 4 th pillar of science Observation, Theory, Experimentation, Simulation Explore universe through simulation
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationTutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers
Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012
More informationIntroduction to the Intel Xeon Phi on Stampede
June 10, 2014 Introduction to the Intel Xeon Phi on Stampede John Cazes Texas Advanced Computing Center Stampede - High Level Overview Base Cluster (Dell/Intel/Mellanox): Intel Sandy Bridge processors
More informationBig Data Systems on Future Hardware. Bingsheng He NUS Computing
Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big
More information