NVIDIA : FLOP WARS, ÉPISODE III François Courteille Ecole Polytechnique 4-June-13
|
|
- Hugh Harvey
- 5 years ago
- Views:
Transcription
1 NVIDIA : FLOP WARS, ÉPISODE III François Courteille fcourteille@nvidia.com Ecole Polytechnique 4-June-13 1
2 OUTLINE NVIDIA and GPU Computing Roadmap Inside Kepler Architecture SXM Hyper-Q Dynamic Parallelism Computing and Visualizing : OpenGL support Programming GPUs The Software Ecosystem OpenACC : Libraries Languages and Frameworks Application porting examples : MiniFE & Enzo 2
3 NVIDIA - Core Technologies and Brands GPU Mobile Cloud GeForce Quadro, Tesla Founded 1993 Tegra Invented GPU 1999 Computer Graphics VGX GeForce GRID 3
4 4
5 The March of GPUs Gflops/s 1400 Peak Double Precision FP Kepler GBytes/s 250 Peak Memory Bandwidth Kepler M1060 Nehalem 3 GHz Fermi M2070 Westmere 3 GHz Fermi+ M core Sandy Bridge 3 GHz M1060 Nehalem 3 GHz Fermi M2070 Westmere 3 GHz Fermi+ M core Sandy Bridge 3 GHz Double Precision: NVIDIA GPU Double Precision: x86 CPU NVIDIA GPU (ECC off) x86 CPU 5
6 DP GFLOPS per Watt Tesla CUDA Architecture Roadmap Kepler Dynamic Parallelism Maxwell Unified Virtual Memory Volta Stacked DRAM 2 Fermi FP Tesla CUDA
7 NVIDIA Tesla GPUs for HPC
8 NVIDIA Tesla Series Products Data Center Workstation 8
9 Kepler GPU Fastest, Most Efficient HPC Architecture Ever SMX 3x Performance per Watt Hyper-Q Dynamic Parallelism Easy Speed-up for Legacy MPI Apps Parallel Programming Made Easier than Ever 9
10 Supercomputing Weather / Climate Modeling Molecular Dynamics Computational Physics Manufacturing Life Sciences Defense / Govt Oil and Gas Structural Mechanics Comp Fluid Dynamics (CFD) Electromagnetics Biochemistry Bioinformatics Material Science Signal Processing Image Processing Video Analytics Reverse Time Migration Kirchoff Time Migration Q2 Q3 Q4 Tesla M2090 Tesla M2075 Tesla K10 Fermi Kepler GK104 Tesla K20 Kepler GK110 10
11 Tesla K10 Same Power, 2x Performance of Fermi Product Name M2090 K10 GPU Architecture Fermi Kepler GK104 # of GPUs 1 2 Board Per GPU Single Precision Flops 1.3 TF 4.58 TF 2.29 TF Double Precision Flops 0.66 TF TF TF # CUDA Cores Memory size 6 GB 8 GB 4GB Memory BW (ECC off) GB/s 320 GB/s 160GB/s PCI-Express Gen 2: 8 GB/s Gen 3: 16 GB/s Board Power 225 watts 225 watts 11
12 Tesla K10 vs M2090: 2x Performance / Watt W W 0.00 Seismic Processing LAMMPS NAMD AMBER* Radio Astronomy Cross-Correlator Nbody Defense (Integer Ops) * 2 instances of AMBER running JAC 12
13 TFLOPS Tesla K20 Family: 3x Faster Than Fermi Tesla K20X Tesla K20X Tesla K20 # CUDA Cores Peak Double Precision Peak DGEMM 1.32 TF 1.22 TF 1.17 TF 1.10 TF 1.25 Double Precision FLOPS (DGEMM) 1.22 TFLOPS Peak Single Precision Peak SGEMM 3.95 TF 2.90 TF 3.52 TF 2.61 TF TFLOPS.43 TFLOPS Xeon E Tesla M2090 Tesla K20X Memory Bandwidth 250 GB/s 208 GB/s Memory size 6 GB 5 GB Total Board Power 235W 225W 13
14 Tesla K20X: Faster,Efficient TFlops 1.5 Double Precision (DGEMM) 94% Efficiency TFlops Single Precision (SGEMM) GB/s 300 Memory Bandwidth (STREAM Triad) 70% Efficiency Tesla K20X 0.0 Tesla K20X 0 Tesla K20X 14 Source: Intel
15 Up to 10x on Leading Applications Speedup vs. Dual Socket CPUs 20.0x Performance Across Science Domains 15.0x 10.0x 5.0x 0.0x WL-LSMS- Material Science Chroma- Physics SPECFEM3D- Earth Sciences AMBER- Molecular Dynamics 1xCPU + 1xM2090 1xCPU + 1xK20X CPU: E5-2687w 3.10 GHz Sandy 15 Bridge
16 Titan: World s Fastest Supercomputer 18,688 Tesla K20X GPUs 27 Petaflops Peak: 90% of Performance from GPUs Petaflops Sustained Performance on Linpack 16
17 World s Most Energy Efficient Supercomputer Greener than Xeon Phi, Xeon CPU 3150 MFLOPS/Watt 128 Tesla K20 Accelerators $100k Energy Savings / Yr MFLOPS/Watt Tons of CO 2 Saved / Yr CINECA Eurora 0 CINECA Eurora- Tesla K20 NICS Beacon- Greenest Xeon Phi System C-DAC- Greenest CPU System Liquid-Cooled Eurotech Aurora Tigon 17
18 GPU Test Drive Double your Fermi Performance with Kepler GPUs
19 Tesla K20/K20X Details 19
20 Kepler GK110 Block Diagram Architecture 7.1B Transistors 15 SMX units > 1 TFLOP FP MB L2 Cache 384-bit GDDR5 PCI Express Gen2/Gen3 20
21 Kepler GK110 SMX vs Fermi SM 3x sustained perf/w Ground up redesign for perf/w 6x the SP FP units 4x the DP FP units Significantly slower FU clocks Processors are getting wider, not faster 21
22 Hyper-Q 22
23 Hyper-Q Improves Concurrency Stream 1 Stream 2 Stream 3 A -- B -- C Stream 1 A B C P Q R X Y Z A--B--C P--Q--R X--Y--Z Hardware Work Queue P -- Q -- R Stream 2 X -- Y -- Z Stream 3 A--B--C P--Q--R X--Y--Z Multiple Hardware Work Queues Streams are separate [ABC] & [PQR] & [XYZ] run concurrently Fermi allows 16-way concurrency Up to 16 grids can run at once But CUDA streams multiplex into a single queue Overlap only at stream edges Kepler allows 32-way concurrency One work queue per stream Concurrency at full-stream level No inter-stream dependencies Any launch ordering 23
24 GPU Utilization % GPU Utilization % Hyper-Q Max GPU Utilization, Slashes CPU Idle Time Time 0 Time 24
25 Better Utilization with Hyper-Q FERMI 1 Work Queue Grid Management Unit selects most appropriate task from up to 32 hardware queues (CUDA streams) KEPLER 32 Concurrent Work Queues Improves scheduling of concurrently executed grids Particularly interesting for MPI applications when combined with Multi Process Server, but not limited to MPI applications 25
26 Hyper-Q with Multiple MPI Ranks with CP2K Hyper-Q with multiple MPI ranks leads to 2.5X speedup over single MPI rank using the GPU Blog post by Peter Messmer of NVIDIA 26
27 Dynamic Parallelism Simpler Code, More General, Higher Performance CPU Kepler GPU Better load balancing for dynamic workloads when work-per-block is data-dependent ( e.g. Adaptive Mesh CFD ) Too coarse Too fine Just right Launch new kernels from the GPU Dynamically - based on run-time data Simultaneously - from multiple threads at once Independently - each thread can launch a different grid 27
28 Unified Runtime Interface int main() { float *data; setup(data); } global void B(float *data) { do_stuff(data); } A <<<... >>> (data); B <<<... >>> (data); C <<<... >>> (data); cudadevicesynchronize(); return 0; X <<<... >>> (data); Y <<<... >>> (data); Z <<<... >>> (data); cudadevicesynchronize(); do_more_stuff(data); CPU main A B C GPU X Y Z Dynamic Parallelism 28
29 Stellar Simulation: Supernova radial sections 100s 1000s of matrices per section Dynamic Parallelism Better Aggregation of Small Tasks Batched LU-Decomposition with Kepler dswap() GPU Control Grid dswap() dswap() dswap() dscal() dscal() dscal() dscal() dtrsm() dtrsm() dtrsm() dtrsm() dgemm() dgemm() dgemm() dgemm() GPU Control Grid Each GPU thread in grid controls one matrix (e.g. LU-Decomp) Each thread launches new GPU grids for BLAS operations No need to recode entire BLAS library to support batching 29
30 CPU is Free Dynamic Parallelism Better Programming Model - Simpler Code LU decomposition (Fermi) LU decomposition (Kepler) dgetrf(n, N) { for j=1 to N for i=1 to 64 idamax<<<>>> memcpy dswap<<<>>> memcpy dscal<<<>>> dger<<<>>> next i } memcpy dlaswap<<<>>> dtrsm<<<>>> dgemm<<<>>> next j idamax(); dswap(); dscal(); dger(); dlaswap(); dtrsm(); dgemm(); dgetrf(n, N) { dgetrf<<<>>> synchronize(); } dgetrf(n, N) { for j=1 to N for i=1 to 64 idamax<<<>>> dswap<<<>>> dscal<<<>>> dger<<<>>> next i dlaswap<<<>>> dtrsm<<<>>> dgemm<<<>>> next j } CPU Code GPU Code CPU Code GPU Code 30
31 CUDA Dynamic Parallelism and Programmer Productivity 31
32 GPU Management: nvidia-smi Multi-GPU systems are widely available Different systems are set up differently Want to get quick information on - Approximate GPU utilization - Approximate memory footprint - Number of GPUs - ECC state - Driver version Thu Nov 1 09:10: NVIDIA-SMI Driver Version: GPU Name Bus-Id Disp. Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M. ===============================+======================+====================== 0 Tesla K20X 0000:03:00.0 Off Off N/A 30C P8 28W / 235W 0% 12MB / 6143MB 0% Default Tesla K20X 0000:85:00.0 Off Off N/A 28C P8 26W / 235W 0% 12MB / 6143MB 0% Default Compute processes: GPU Memory GPU PID Process name Usage ============================================================================= No running compute processes found Inspect and modify GPU state 32
33 OpenGL and Tesla Tesla K20/K20X for high performance Compute Tesla K20/K20X for Graphics and Compute Use interop to mix OpenGL and Compute Tesla K20 / K20X 33
34 NVIDIA index Cluster-based graphics infrastructure Real-time manipulation of huge datasets Combine volume and surface rendering Project size scales with cluster size Interactive collaboration with global teams 34
35 HPC long running Application w/ Data HPC + Viz Readback Viz frames of HPC results New Apps Encoding Raytracing (iray, optix) realityserver (CUDA) Desktop Workstation ISV App < Remoted / Backracked > Server CITIRX HDX VMware MS RemoteFX NICE DCV Rack / Blade WS HP RGS Dell Teradici Tesla NVIDIA GRID (Passive Thermal) MAXIMUS-QUADRO(Active Thermals) 35
36 NVIDIA NON Tesla GPUs for HPC
37 Introducing GeForce GTX TITAN The Ultimate CUDA Development GPU Personal Supercomputer on Your Desktop 2688 CUDA Cores 4.5 Teraflops Single Precision 1.27 Teraflops Double Precision 288 GB/s Memory Bandwidth 37
38 Performance Fastest DP of 1.31TFLOPS on Tesla K20X Optimized for Infiniband with NVIDIA GPUDirect Faster Shuffle instructions Tuning and Optimization Support from NVIDIA Experts Tesla Advantage ECC protection Reliability Tested to run real-world workloads 24/7 at 100% utilization 3 year warranty and prioritized support for bugs/feature requests ISVs certify only on Tesla NVIDIA technical support Longer life cycle for continuity and cluster expansion Built for HPC Integrated solutions from Tier 1 OEMs Hyper-Q for accelerating MPI based workloads Tools for GPU Management and Monitoring (Nvhealthmon, nvsmi/nvml) Enterprise OS support Solution expertise provided by CUDA engineers and technical staff Peta-scale designed, tested and optimized 38
39 Accelerated Computing 10x Performance, 5x Energy Efficiency CPU Optimized for Serial Tasks GPU Accelerator Optimized for Many Parallel Tasks 39
40 GPU Accelerated Apps Grows 60% # of Apps % Increase 61% Increase Top Supercomputing Apps Computational Chemistry Material Science Climate & Weather Physics CAE AMBER CHARMM GROMACS QMCPACK Quantum Espresso GAMESS COSMO GEOS-5 Chroma Denovo GTC ANSYS Mechanical MSC Nastran SIMULIA Abaqus Accelerated, In Development LAMMPS NAMD DL_POLY Gaussian NWChem VASP CAM-SE NIM WRF GTS ENZO MILC ANSYS Fluent OpenFOAM LS-DYNA 40
41 200+ GPU-Accelerated Applications 41
42 42
43 Small Changes, Big Speed-up Application Code GPU Compute-Intensive Functions Use GPU to Parallelize Rest of Sequential CPU Code CPU + 43
44 44
45 3 Ways to Accelerate Applications Applications Libraries OpenACC Directives (OpenACC) Directives Programming Languages (CUDA,..) High Level Languages (Matlab,..) CUDA Libraries are interoperable with OpenACC CUDA Language is interoperable with OpenACC Easiest Approach Maximum Performance No Need for Programming Expertise 45
46 OpenACC Directives CPU GPU Program myscience... serial code...!$acc region do k = 1,n1 do i = 1,n2... parallel code... enddo enddo!$acc end region... End Program myscience Your original Fortran or C code OpenACC Compiler Hint Easy, Open, Powerful Simple Compiler hints Works on multicore CPUs & many core GPUs Compiler Parallelizes code Future Integration into OpenMP standard planned 46
47 Familiar to OpenMP Programmers OpenMP OpenACC CPU CPU GPU main() { double pi = 0.0; long i; main() { double pi = 0.0; long i; #pragma omp parallel for reduction(+:pi) for (i=0; i<n; i++) { double t = (double)((i+0.05)/n); pi += 4.0/(1.0+t*t); } printf( pi = %f\n, pi/n); } #pragma acc kernels for (i=0; i<n; i++) { double t = (double)((i+0.05)/n); pi += 4.0/(1.0+t*t); } printf( pi = %f\n, pi/n); } 47
48 OpenACC: Easy and Portable do i = 1, 2560 do j = 1, fa(i) = a * fa(i) + fb(i) end do end do Serial Code: SAXPY OpenACC: Runs on GPUs and Xeon Phi threadid Thread Block 0 Use 2 levels of HDW parallelism Thread Block N !$acc parallel loop do i = 1, 2560!dir$ unroll 1000 do j = 1, fa(i) = a * fa(i) + fb(i) end do end do float x = input[threadid]; float y = func(x); output[threadid] float x = input[threadid]; float y = func(x); output[threadid] 48
49 Additions for OpenACC 2.0 Procedure calls Separate compilation Nested parallelism Device-specific tuning, multiple devices Data management features and global data Multiple host thread support Loop directive additions Asynchronous behavior additions New API routines for target platforms (CUDA, OpenCL, Intel Coprocessor Offload Infrastructure) See 49
50 (from GTC 2013) Applying OpenACC to Legacy Codes Exploit GPU with LESS effort; maintain ONE legacy source Example: REAL-WORLD application tuning using directives (comparing CPU+GPU vs. multi-core) ELAN Computational Electro-Magnetics Goals: optimize w/ less effort, preserve code base Kernels 6.5X to 13X faster than 16-core Xeon Overall speedup 3.2X COSMO Weather Goal: preserve physics code (22% of runtime), augmenting dynamics kernels already in CUDA Physics speedup 4.2X vs. multi-core Xeon Results from EMGS, MeteoSwiss/CSCS 50
51 Small Effort. Real Impact. Large Oil Company Univ. of Houston Uni. Of Melbourne Ufa State Aviation GAMESS-UK Prof. M.A. Kayali Prof. Kerry Black Prof. Arthur Dr. Wilkinson, 3x in 7 days 20x in 2 days 65x in 2 days Yuldashev Prof. Naidoo Solving billions of equations iteratively for oil production at world s largest petroleum reservoirs Studying magnetic systems for innovations in magnetic storage media and memory, field sensors, and Better understand complex reasons by lifecycles of snapper fish in Port Phillip Bay 7x in 4 Weeks Generating stochastic geological models of oilfield reservoirs with borehole data 10x Used for various fields such as investigating biofuel production and molecular sensors. 51
52 Example: Jacobi Iteration Iteratively converges to correct value (e.g. Temperature), by computing new values at each point from the average of neighboring points. Common, useful algorithm Example: Solve Laplace equation in 2D: 2 f(x, y) = 0 A(i,j+1) A(i-1,j) A(i,j) A(i+1,j) A k+1 i, j = A k(i 1, j) + A k i + 1, j + A k i, j 1 + A k i, j A(i,j-1) 52
53 Jacobi Iteration Fortran Code do while ( err > tol.and. iter < iter_max ) err=0._fp_kind Iterate until converged do j=1,m do i=1,n Anew(i,j) =.25_fp_kind * (A(i+1, j ) + A(i-1, j ) + & A(i, j-1) + A(i, j+1)) err = max(err, Anew(i,j) - A(i,j)) end do end do do j=1,m-2 do i=1,n-2 A(i,j) = Anew(i,j) end do end do iter = iter +1 end do Iterate across matrix elements Calculate new value from neighbors Compute max error for convergence Swap input/output arrays 53
54 Jacobi Iteration: OpenACC Fortran Code!$acc data copy(a), create(anew) do while ( err > tol.and. iter < iter_max ) err=0._fp_kind Copy A in at beginning of loop, out at end. Allocate Anew on accelerator!$acc kernels do j=1,m do i=1,n Anew(i,j) =.25_fp_kind * (A(i+1, j ) + A(i-1, j ) + & A(i, j-1) + A(i, j+1)) err = max(err, Anew(i,j) - A(i,j)) end do end do!$acc end kernels... iter = iter +1 end do!$acc end data 54
55 3 Ways to Accelerate Applications Applications Libraries OpenACC Directives Programming Languages Drop-in Acceleration Easily Accelerate Applications Maximum Flexibility 55
56 Some GPU-accelerated Libraries NVIDIA cublas NVIDIA curand NVIDIA cusparse NVIDIA NPP Vector Signal Image Processing GPU Accelerated Linear Algebra Matrix Algebra on GPU and Multicore NVIDIA cufft IMSL Library ArrayFire Building-block Matrix Algorithms Computations for CUDA Sparse Linear Algebra C++ STL Features for CUDA 56
57 Explore the CUDA (Libraries) Ecosystem CUDA Tools and Ecosystem described in detail on NVIDIA Developer Zone: developer.nvidia.com/cuda-toolsecosystem 57
58 3 Ways to Accelerate Applications Applications Libraries OpenACC Directives Programming Languages Drop-in Acceleration Easily Accelerate Applications Maximum Flexibility 58
59 GPU Programming Languages Numerical analytics MATLAB, Mathematica, LabVIEW Fortran OpenACC, CUDA Fortran C OpenACC, CUDA C C++ Thrust, CUDA C++ Python C# PyCUDA, Copperhead, NumbaPro (Continuum Analytics) GPU.NET, Hybridizer(AltiMesh) 59
60 Get Started Today These languages are supported on all CUDA-capable GPUs. You might already have a CUDA-capable GPU in your laptop or desktop PC! CUDA C/C++ GPU.NET Thrust C++ Template Library CUDA Fortran PyCUDA (Python) MATLAB matlab-gpu.html Mathematica -in-8/cuda-and-opencl-support/ 60
61 Easiest Way to Learn CUDA 50k Enrolled 127 Countries Learn from the Best Prof. John Owens UC Davis Dr. David Luebke NVIDIA Research Prof. Wen-mei W. Hwu U of Illinois Heterogeneous Parallel Programming Anywhere, Any Time Online Worldwide Self Paced $$ It s Free! No Tuition No Hardware No Books Introduction to Parallel Programming Engage with an Active Community Forums and Meetups Hands-on Projects 61
62 NVIDIA Tesla Update Supercomputing 12 Sumit Gupta Thank You General Manager Tesla Accelerated Computing 62
Accelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationGPU Computing. Axel Koehler Sr. Solution Architect HPC
GPU Computing Axel Koehler Sr. Solution Architect HPC 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro, Tesla ARM SoCs: Tegra VGX 2 Continued Demand for Ever Faster Supercomputers First-principles
More informationLes dernières générations chez NVIDIA :
Les dernières générations chez NVIDIA : NVIDIA Tesla Update Supercomputing 12 Journée cartes graphiques et calcul Sumit Gupta intensif Observatoire Midi-Pyrénées General Manager Tesla Accelerated Computing
More informationOPENACC DIRECTIVES FOR ACCELERATORS NVIDIA
OPENACC DIRECTIVES FOR ACCELERATORS NVIDIA Directives for Accelerators ABOUT OPENACC GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research Universities Supercomputing Centers
More informationGPU Computing Ecosystem
GPU Computing Ecosystem CUDA 5 Enterprise level GPU Development GPU Development Paths Libraries, Directives, Languages GPU Tools Tools, libraries and plug-ins for GPU codes Tesla K10 Kepler! Tesla K20
More informationCUDA 5 and Beyond. Mark Ebersole. Original Slides: Mark Harris 2012 NVIDIA
CUDA 5 and Beyond Mark Ebersole Original Slides: Mark Harris The Soul of CUDA The Platform for High Performance Parallel Computing Accessible High Performance Enable Computing Ecosystem Introducing CUDA
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationIntroduction to GPU Computing. 周国峰 Wuhan University 2017/10/13
Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science
More informationGPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA
GPU Computing fuer rechenintensive Anwendungen Axel Koehler NVIDIA GeForce Quadro Tegra Tesla 2 Continued Demand for Ever Faster Supercomputers First-principles simulation of combustion for new high-efficiency,
More informationGPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester
NVIDIA GPU Computing A Revolution in High Performance Computing GPUs and the Future of Accelerated Computing Emerging Technology Conference 2014 University of Manchester John Ashley Senior Solutions Architect
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationNVIDIA GPU TECHNOLOGY UPDATE
NVIDIA GPU TECHNOLOGY UPDATE May 2015 Axel Koehler Senior Solutions Architect, NVIDIA NVIDIA: The VISUAL Computing Company GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationNOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY. Peter Messmer
NOVEL GPU FEATURES: PERFORMANCE AND PRODUCTIVITY Peter Messmer pmessmer@nvidia.com COMPUTATIONAL CHALLENGES IN HEP Low-Level Trigger High-Level Trigger Monte Carlo Analysis Lattice QCD 2 COMPUTATIONAL
More informationAPPROACHES. TO GPU COMPUTING Libraries, OpenACC Directives, and Languages
APPROACHES TO GPU COMPUTING Libraries, OpenACC Directives, and Languages Add GPUs: Accelerate Science Applications CPU GPU 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois,
More informationPeter Messmer Developer Technology Group Stan Posey HPC Industry and Applications
Peter Messmer Developer Technology Group pmessmer@nvidia.com Stan Posey HPC Industry and Applications sposey@nvidia.com U Progress Reported at This Workshop 2011 2012 CAM SE COSMO GEOS 5 CAM SE COSMO GEOS
More informationCUDA Update: Present & Future. Mark Ebersole, NVIDIA CUDA Educator
CUDA Update: Present & Future Mark Ebersole, NVIDIA CUDA Educator Recent CUDA News Kepler K20 & K20X Kepler GPU Architecture: Streaming Multiprocessor (SMX) 192 SP CUDA Cores per SMX 64 DP CUDA Cores per
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationIntroduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator
Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationScaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies
Scaling in a Heterogeneous Environment with GPUs: GPU Architecture, Concepts, and Strategies John E. Stone Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jensen Huang, Founder & CEO SC17 Nov. 13, 2017
ACCELERATED COMPUTING: THE PATH FORWARD Jensen Huang, Founder & CEO SC17 Nov. 13, 2017 COMPUTING AFTER MOORE S LAW Tech Walker 40 Years of CPU Trend Data 10 7 GPU-Accelerated Computing 10 5 1.1X per year
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationTechnologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017
Technologies and application performance Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017 The landscape is changing We are no longer in the general purpose era the argument of
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationNVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010
NVIDIA GPU Computing 2010 Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Tesla 3 rd generation Full OEM coverage Ecosystem focus Value Propositions per segments Card System Module
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationKepler Overview Mark Ebersole
Kepler Overview Mark Ebersole TFLOPS TFLOPS 3x Performance in a Single Generation 3.5 3 2.5 2 1.5 1 0.5 0 1.25 1 Single Precision FLOPS (SGEMM) 2.90 TFLOPS.89 TFLOPS.36 TFLOPS Xeon E5-2690 Tesla M2090
More informationGPU Computing with OpenACC Directives Presented by Bob Crovella For UNC. Authored by Mark Harris NVIDIA Corporation
GPU Computing with OpenACC Directives Presented by Bob Crovella For UNC Authored by Mark Harris NVIDIA Corporation GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationAXEL KOEHLER GPU Computing Update
AXEL KOEHLER GPU Computing Update Agenda Introduction GPU Computing Introduction into GPU Programming Kepler GPU Architecture GPU Applications Future Developments 2 NVIDIA: Parallel Computing Company GPUs:
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationFuture Directions for CUDA Presented by Robert Strzodka
Future Directions for CUDA Presented by Robert Strzodka Authored by Mark Harris NVIDIA Corporation Platform for Parallel Computing Platform The CUDA Platform is a foundation that supports a diverse parallel
More informationTESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ
TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationNVIDIA TECHNOLOGY Mark Ebersole
NVIDIA TECHNOLOGY Mark Ebersole ACM Learning Center http://learning.acm.org 1,350+ trusted technical books and videos by leading publishers including O Reilly, Morgan Kaufmann, others Online courses with
More informationOpenACC Course Lecture 1: Introduction to OpenACC September 2015
OpenACC Course Lecture 1: Introduction to OpenACC September 2015 Course Objective: Enable you to accelerate your applications with OpenACC. 2 Oct 1: Introduction to OpenACC Oct 6: Office Hours Oct 15:
More informationInside Kepler. Manuel Ujaldon Nvidia CUDA Fellow. Computer Architecture Department University of Malaga (Spain)
Inside Kepler Manuel Ujaldon Nvidia CUDA Fellow Computer Architecture Department University of Malaga (Spain) Talk outline [46 slides] 1. Introducing the architecture [2] 2. Cores organization [9] 3. Memory
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationFaster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs. Baskar Rajagopalan Accelerated Computing, NVIDIA
Faster Innovation - Accelerating SIMULIA Abaqus Simulations with NVIDIA GPUs Baskar Rajagopalan Accelerated Computing, NVIDIA 1 Engineering & IT Challenges/Trends NVIDIA GPU Solutions AGENDA Abaqus GPU
More informationGPU ARCHITECTURE Chris Schultz, June 2017
GPU ARCHITECTURE Chris Schultz, June 2017 MISC All of the opinions expressed in this presentation are my own and do not reflect any held by NVIDIA 2 OUTLINE CPU versus GPU Why are they different? CUDA
More informationIntroduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series
Introduction to GPU Computing Using CUDA Spring 2014 Westgid Seminar Series Scott Northrup SciNet www.scinethpc.ca (Slides http://support.scinet.utoronto.ca/ northrup/westgrid CUDA.pdf) March 12, 2014
More informationFinite Element Integration and Assembly on Modern Multi and Many-core Processors
Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,
More informationApril 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,
April 4-7, 2016 Silicon Valley INSIDE PASCAL Mark Harris, October 27, 2016 @harrism INTRODUCING TESLA P100 New GPU Architecture CPU to CPUEnable the World s Fastest Compute Node PCIe Switch PCIe Switch
More informationIntroduction to GPU Computing Using CUDA. Spring 2014 Westgid Seminar Series
Introduction to GPU Computing Using CUDA Spring 2014 Westgid Seminar Series Scott Northrup SciNet www.scinethpc.ca March 13, 2014 Outline 1 Heterogeneous Computing 2 GPGPU - Overview Hardware Software
More informationCUDA Accelerated Compute Libraries. M. Naumov
CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party
More informationINTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC. Jeff Larkin, NVIDIA Developer Technologies
INTRODUCTION TO ACCELERATED COMPUTING WITH OPENACC Jeff Larkin, NVIDIA Developer Technologies AGENDA Accelerated Computing Basics What are Compiler Directives? Accelerating Applications with OpenACC Identifying
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationStan Posey, NVIDIA, Santa Clara, CA, USA
Stan Posey, sposey@nvidia.com NVIDIA, Santa Clara, CA, USA NVIDIA Strategy for CWO Modeling (Since 2010) Initial focus: CUDA applied to climate models and NWP research Opportunities to refactor code with
More informationIntroduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University
Introduction to High Performance Computing Shaohao Chen Research Computing Services (RCS) Boston University Outline What is HPC? Why computer cluster? Basic structure of a computer cluster Computer performance
More informationCME 213 S PRING Eric Darve
CME 213 S PRING 2017 Eric Darve Summary of previous lectures Pthreads: low-level multi-threaded programming OpenMP: simplified interface based on #pragma, adapted to scientific computing OpenMP for and
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationHigh-Productivity CUDA Programming. Levi Barnes, Developer Technology Engineer, NVIDIA
High-Productivity CUDA Programming Levi Barnes, Developer Technology Engineer, NVIDIA MORE RESOURCES How to learn more GTC -- March 2014 San Jose, CA gputechconf.com Video archives, too Qwiklabs nvlabs.qwiklabs.com
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationGetting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator
Getting Started with CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Heterogeneous Computing CPU GPU Once upon a time Past Massively Parallel Supercomputers Goodyear MPP Thinking Machine MasPar Cray 2 1.31
More informationThe Visual Computing Company
The Visual Computing Company Update NVIDIA GPU Ecosystem Axel Koehler, Senior Solutions Architect HPC, NVIDIA Outline Tesla K40 and GPU Boost Jetson TK-1 Development Board for Embedded HPC Pascal GPU 3D
More informationGPU ARCHITECTURE Chris Schultz, June 2017
Chris Schultz, June 2017 MISC All of the opinions expressed in this presentation are my own and do not reflect any held by NVIDIA 2 OUTLINE Problems Solved Over Time versus Why are they different? Complex
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationCSC573: TSHA Introduction to Accelerators
CSC573: TSHA Introduction to Accelerators Sreepathi Pai September 5, 2017 URCS Outline Introduction to Accelerators GPU Architectures GPU Programming Models Outline Introduction to Accelerators GPU Architectures
More informationGPU Computing with OpenACC Directives Presented by Bob Crovella. Authored by Mark Harris NVIDIA Corporation
GPU Computing with OpenACC Directives Presented by Bob Crovella Authored by Mark Harris NVIDIA Corporation GPUs Reaching Broader Set of Developers 1,000,000 s 100,000 s Early Adopters Research Universities
More informationThe Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations
The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationProgramming paradigms for GPU devices
Programming paradigms for GPU devices OpenAcc Introduction Sergio Orlandini s.orlandini@cineca.it 1 OpenACC introduction express parallelism optimize data movements practical examples 2 3 Ways to Accelerate
More informationCMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)
CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationPiz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design
Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design Sadaf Alam & Thomas Schulthess CSCS & ETHzürich CUG 2014 * Timelines & releases are not precise Top 500
More informationSystem Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.
System Design of Kepler Based HPC Solutions Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering. Introduction The System Level View K20 GPU is a powerful parallel processor! K20 has
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology
ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU Peng Wang HPC Developer Technology NVIDIA SuperPhones to SuperComputers Computers no longer get faster, just wider Architectural Features Common to All Processors
More informationarxiv: v1 [physics.comp-ph] 4 Nov 2013
arxiv:1311.0590v1 [physics.comp-ph] 4 Nov 2013 Performance of Kepler GTX Titan GPUs and Xeon Phi System, Weonjong Lee, and Jeonghwan Pak Lattice Gauge Theory Research Center, CTP, and FPRD, Department
More informationNVIDIA Update and Directions on GPU Acceleration for Earth System Models
NVIDIA Update and Directions on GPU Acceleration for Earth System Models Stan Posey, HPC Program Manager, ESM and CFD, NVIDIA, Santa Clara, CA, USA Carl Ponder, PhD, Applications Software Engineer, NVIDIA,
More informationCS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS
CS 179: GPU Computing LECTURE 4: GPU MEMORY SYSTEMS 1 Last time Each block is assigned to and executed on a single streaming multiprocessor (SM). Threads execute in groups of 32 called warps. Threads in
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationHigh-Productivity CUDA Programming. Cliff Woolley, Sr. Developer Technology Engineer, NVIDIA
High-Productivity CUDA Programming Cliff Woolley, Sr. Developer Technology Engineer, NVIDIA HIGH-PRODUCTIVITY PROGRAMMING High-Productivity Programming What does this mean? What s the goal? Do Less Work
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationDELIVERING HIGH-PERFORMANCE REMOTE GRAPHICS WITH NVIDIA GRID VIRTUAL GPU. Andy Currid NVIDIA
DELIVERING HIGH-PERFORMANCE REMOTE GRAPHICS WITH NVIDIA GRID VIRTUAL Andy Currid NVIDIA WHAT YOU LL LEARN IN THIS SESSION NVIDIA's GRID Virtual Architecture What it is and how it works Using GRID Virtual
More informationThe State of Accelerated Applications. Michael Feldman
The State of Accelerated Applications Michael Feldman Accelerator Market in HPC Nearly half of all new HPC systems deployed incorporate accelerators Accelerator hardware performance has been advancing
More informationSelecting the right Tesla/GTX GPU from a Drunken Baker's Dozen
Selecting the right Tesla/GTX GPU from a Drunken Baker's Dozen GPU Computing Applications Here's what Nvidia says its Tesla K20(X) card excels at doing - Seismic processing, CFD, CAE, Financial computing,
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More informationNEW FEATURES IN CUDA 6 MAKE GPU ACCELERATION EASIER MARK HARRIS
NEW FEATURES IN CUDA 6 MAKE GPU ACCELERATION EASIER MARK HARRIS 1 Unified Memory CUDA 6 2 3 XT and Drop-in Libraries GPUDirect RDMA in MPI 4 Developer Tools 1 Unified Memory CUDA 6 2 3 XT and Drop-in Libraries
More informationTESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications
TESLA P PERFORMANCE GUIDE Deep Learning and HPC Applications SEPTEMBER 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More information