Supercomputing at 1/10 th the Cost.

Size: px
Start display at page:

Download "Supercomputing at 1/10 th the Cost."

Transcription

1 Supercomputing at 1/10 th the Cost

2 GPU Computing * CPU + GPU Co-Processing 4 cores CPU 48 GigaFlops (DP) GPU 515 GigaFlops (DP) * aka GPGPU 2

3 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois, Urbana Video Transcoding Elemental Tech Matlab Computing AccelerEyes Astrophysics RIKEN 15x 150x 149X 47X 20X 130X 30X Financial simulation Oxford Linear Algebra Universidad Jaime 3D Ultrasound Techniscan Quantum Chemistry U of Illinois, Urbana Gene Sequencing U of Maryland 3

4 Tesla Data Center & Workstation GPU Solutions Tesla M-series GPUs M2070 M2050 M1060 M-series M-series Example: Nitro G5 1U Tesla S-series 1U Systems S2050 S1070 S-series 1U System Tesla C-series GPUs C2070 C2050 C1060 Integrated CPU-GPU Servers & Blades OEM CPU Server + Tesla S-series 1U Workstations 2 to 4 Tesla GPUs 4

5 DRAM I/F Giga Thread HOST I/F DRAM I/F Fermi: The Computational GPU Performance 7x Double Precision of CPUs IEEE SP & DP Floating Point Flexibility Increased Shared Memory from 16 KB to 48 KB Added L1 and L2 Caches ECC on all Internal and External Memories Enable up to 1 TeraByte of GPU Memories High Speed GDDR5 Memory Interface L2 DRAM I/F DRAM I/F DRAM I/F Usability Multiple Simultaneous Tasks on GPU 10x Faster Atomic Operations C++ Support System Calls, printf support DRAM I/F 5

6 NVIDIA Tesla GPU Computing Products Data Center Products Workstation Server Module 1U Systems Workstation Boards Tesla M2070 / Tesla M2050 Tesla M1060 Tesla S2050 Tesla S1070 Tesla C2070 / Tesla C2050 Tesla C1060 GPUs 1 T20 GPU 1 T10 GPU 4 T20 GPUs 4 T10 GPUs 1 T20 GPU 1 T10 GPU Single Precision 1030 GFlops 933 GFlops 4120 GFlops 4140 GFlops 1030 Gflops 933 GFlops Double Precision 515 Gflops 78 GFlops 2060 GFlops 346 GFlops 515 Gflops 78 GFlops Memory 6 GB / 3 GB 4 GB 12 GB (S2050) 16 GB 4 GB / GPU 6 GB / 3 GB 4 GB Mem BW GB/s 102 GB/s GB/s 102 GB/s 144 GB/s 102 GB/s 6

7 NVIDIA GPGPU Developer Eco-System Numerical Packages MATLAB Mathematica NI LabView pycuda Debuggers & Profilers cuda-gdb NV Visual Profiler Parallel Nsight Visual Studio Allinea TotalView GPU Compilers C C++ Fortran OpenCL DirectCompute Java Python Parallelizing Compilers PGI Accelerator CAPS HMPP mcuda OpenMP Libraries BLAS FFT LAPACK NPP Video Imaging GPULib GPGPU Consultants & Training OEM Solution Providers ANEO GPU Tech 7

8 World s Fastest GPU Supercomputer Petaflops Top500: #1 in Asia, #2 in the World 8

9 #2: Dawning Nebulae 1.27 Petaflop 4640 GPUs 2.55 MWatts 9

10 Power MegaWatts 8 2x Performance / Watt Jaguar x86 CPU 4 3 Nebulae Tesla GPU 2 1 IPE, CAS Tesla GPU JUGENE BlueGene Roadrunner Cell Linpack Performance (Teraflops) 10

11 8x Higher Linpack Performance Gflops Performance / $ Gflops / $K Performance / watt Gflops / kwatt CPU Server GPU-CPU Server 0 CPU Server GPU-CPU Server 0 CPU Server GPU-CPU Server CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw 11

12 20+ Oil & Gas Companies Porting to CUDA Successful Customers GPU vs CPU Improvements Oil & Gas ISVs Performance / Watt 18x - 27x 12x - 17x Performance / Space 20x - 31x 15x - 20x Performance / Cost 15x - 20x 10x - 12x 12

13 Oil & Gas: Seismic Processing 1 Equal Performance 1 32 Tesla S1070s 31x Less Space 2000 CPU Servers 20x Lower Cost ~$400 K ~$8 M 45 kwatts 27x Lower Power 1200 kwatts 13

14 Finance: 10+ Banks Porting to CUDA Successful Customers Derivative Pricing using SciFinance Basket Equity-Linked Structured Note 2 Tesla C1060s 0.25 secs 124x Several unannounced Finance ISVs 1 Tesla C secs 77x Intel Xeon (2.6 GHz) secs Time (secs) UnRisk 14

15 Bloomberg: Bond Pricing 48 GPUs 42x Lower Space 2000 CPUs $144K 28x Lower Cost $4 Million $31K / year 38x Lower Power Cost $1.2 Million / year 15

16 Tesla Bio WorkBench : Bio-Chemistry & Bio-Informatics TeraChem Hex (Docking) Applications LAMMPS CUDA-BLASTP CUDA-MEME MUMmerGPU CUDA-EC Community Download, Documentation Technical papers Discussion Forums Benchmarks & Configurations Tesla Personal Supercomputer Tesla GPU Clusters Platforms 16

17 Finding a Better Shampoo Tesla PSC Equal Performance 32 CPU Servers No Data Center Required 1 kwatt 21 kwatts 13x Lower System Cost $7 K $128 K 19x Lower Power & Cooling Cost $2 K $37 K 17

18 The Tesla Advantage 18

19 Parallel Computing on All GPUs 100+ Million CUDA GPUs Deployed GeForce Entertainment Tesla TM High-Performance Computing Quadro Design & Creation 19

20 Tesla : Built for Professional Computing Features Quality & Warranty Support & Lifecycle Feature 4x Higher double precision (on 20-series) ECC only on Tesla & Quadro (on 20-series) Bi-directional PCI-E communication (Tesla has Dual DMA Engines, GeForce has only 1 DMA Engine) Larger memory for larger data sets 3GB and 6GB Products Cluster management software tools available on Tesla only TCC (Tesla Compute Cluster) driver supported for Windows OS only on Tesla. Integrated OEM workstations and servers Professional ISVs will certify CUDA applications only on Tesla 2 to 4 day Stress testing & memory burn-in for reliability. Added margin in memory and core clocks for added reliability. Manufactured & guaranteed by NVIDIA 3-year warranty from NVIDIA Enterprise support, higher priority for CUDA bugs and requests Benefits Higher Performance for scientific CUDA applications Data reliability inside the GPU and on DRAM memories Higher Performance for CUDA applications (by overlapping communication & computation) Higher performance on wide range of applications (medical, oil & gas, manufacturing, FEA, CAE) Needed for GPU monitoring and job scheduling in data center deployments Higher performance for CUDA applications due to lower kernel launch overhead. TCC adds support for RDP and Services Trusted, reliable systems built for Tesla products. Bug reproduction, support, feature requests for Tesla only. Built for 24/7 computing in data center and workstation environments. No changes in key components like GPU and memory without notice. Always the same clocks for known, reliable performance. Reliable, long life products Ability to influence CUDA and GPU roadmap. Get early access to features requests months availability + 6-month EOL notice Reliable product supply 20

21 Programming GPUs 21

22 Doing GPGPU Right: Combination of Hardware and Software GPU Computing Applications Libraries and Middleware cufft cublas CULA LAPACK NPP & cudpp Video PhysX Physics OptiX Ray Tracing mental ray iray Rendering Reality Server 3D Web Services C++ tm Direct C OpenCL Compute Fortran Java and Python NVIDIA GPU CUDA Parallel Computing Architecture 22 OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

23 CUDA C/C++: C with a few new keywords void saxpy_serial(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } // Invoke serial SAXPY kernel saxpy_serial(n, 2.0, x, y); Standard C Code global void saxpy_parallel(int n, float a, float *x, float *y) { int i = blockidx.x*blockdim.x + threadidx.x; if (i < n) y[i] = a*x[i] + y[i]; } // Invoke parallel SAXPY kernel with 256 threads/block int nblocks = (n + 255) / 256; saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y); Parallel C Code 23

24 CUDA Programming Effort / Performance Source : MIT CUDA Course 24

25 Fermi NEXT GENERATION ARCHITECTURE 25

26 Introducing the Fermi Architecture 3 billion transistors 512 cores (max) DP performance 50% of SP ECC L1 and L2 Caches GDDR5 Memory Up to 1 Terabyte of GPU Memory Concurrent Kernels, C++ 26

27 Fermi SM Architecture 32 CUDA cores per SM (512 total) Double precision 50% of single precision 8x over GT200 Dual Thread Scheduler 64 KB of RAM for shared memory and L1 cache (configurable) 27

28 CUDA Core Architecture New IEEE floating-point standard, surpassing even the most advanced CPUs Fused multiply-add (FMA) instruction for both single and double precision Newly designed integer ALU optimized for 64-bit and extended precision operations 28

29 Cached Memory Hierarchy First GPU architecture to support a true cache hierarchy in combination with on-chip shared memory L1 Cache per SM (per 32 cores) Improves bandwidth and reduces latency Unified L2 Cache (768 KB) Fast, coherent data sharing across all cores in the GPU Parallel DataCache Memory Hierarchy 29

30 ECC ECC protection for DRAM ECC supported for GDDR5 memory All major internal memories Register file, shared memory, L1 cache, L2 cache Detect 2-bit errors, correct 1-bit errors (per word) 30

31 GigaThread Hardware Thread Scheduler Hierarchically manages thousands of simultaneously active threads 10x faster application context switching Concurrent kernel execution 31

32 GigaThread Hardware Thread Scheduler Concurrent Kernel Execution + Faster Context Switch Serial Kernel Execution Parallel Kernel Execution 32

33 GigaThread Streaming Data Transfer Engine Dual DMA engines Simultaneous CPU GPU and GPU CPU data transfer Fully overlapped with CPU and GPU processing time Activity Snapshot: 33

34 Enhanced Software Support Full C++ Support Virtual functions Try/Catch hardware support System call support Support for pipes, semaphores, printf, etc Unified 64-bit memory addressing 34

35 Performance Benchmarks 35

36 8x Higher Linpack CPU Server 8x 5x 4.5x Performance Gflops GPU-CPU Server Performance / $ Gflops / $K 11 CPU Server 60 GPU-CPU Server Performance / watt Gflops / kwatt CPU Server GPU-CPU Server CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw 36

37 Speed-up Speed-up Speed-up Speed-up Speed-up Performance Summary MIDG: Discontinuous Galerkin Solvers for PDEs AMBER Molecular Dynamics (Mixed Precision) 18 Lock Exchange Problem OpenCurrent 180 OpenEye ROCS Virtual Drug Screening 8 Radix Sort CUDA SDK Preliminary data 37

38 Standard FFT Library: cufft 3.1 Gflops 400 Single Precision FFT Gflops 120 Double Precision FFT Transform Size Transform Size cufft 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 38

39 Standard BLAS Library: cublas 3.1 Gflops Single Precision BLAS: SGEMM 256x x x x x x8192 Gflops Double Precision BLAS: DGEMM 256x x x x x x8192 Matrix Size Matrix Size cublas 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 39

40 80 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 8112 Matrix Size for Best cublas Performance Gflops SGEMM: Multiples of Gflops DGEMM: Multiples of cublas 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 40

41 GLOPS GLOPS GLOPS CULA 1.3 LAPACK Library from EM Photonics QR Decomposition (DGEQRF) Householder method; Operation count estimated as 1.33N 3 LU Decomposition (DGETRF) Partial pivoting; Operation count estimated as 0.66N 3 Singular Value Decomposition (DGESVD) Left & right singular vectors; Operation count estimated as 21N Matrix Size Matrix Size Matrix Size Core i7-920 Tesla C1060 Tesla C2050 Core i7-920 Tesla C1060 Tesla C2050 Core i7-920 Tesla C1060 Tesla C2050 Double Precision Results Data Courtesy: EM Photonics 41

42 Sparse Matrix-Vector Multiplication (SpMV) Gflops Single Precision Tesla C2050 (ECC off) Tesla C2050 (ECC on) Tesla C1060 Intel Xeon 5550 Gflops Double Precision Tesla C2050 (ECC off) Tesla C2050 (ECC on) Tesla C1060 Intel Xeon SpMv: CUDA 3.0, Tesla C1060 and Tesla C2050 MKL 10.2: Intel Xeon 5550, 2.67 GHz Preliminary data 42

43 Australia GPU Users Groups Informal local special interest groups bring together GPU users from all fields and experience levels Groups in Brisbane, Sydney, Perth, and soon Melbourne First meetings held June, 2010 Second meetings likely in August Also look for the Australia GPU Users group on linkedin.com 43

44 NVIDIA Academic Partnership Support faculty research & teaching efforts Small equipment gifts (1-2 GPUs) Significant discounts on GPU purchases Easy Especially Quadro, Tesla equipment Useful for cost matching Research contracts Small cash grants (typically ~$25K gifts) Medium-scale equipment donations (10-30 GPUs) Informal proposals, reviewed quarterly Focus areas: GPU computing, especially with an educational mission or component NVIDIA PhD Fellowship: accepting apps in November! Competitive 44

45 GPU Technology Conference 2010 Monday, Sept Thurs., Sept. 23, 2010 San Jose Convention Center, San Jose, California The most important event in the GPU ecosystem Learn about seismic shifts in GPU computing Preview disruptive technologies and emerging applications Get tools and techniques to impact mission critical projects Network with experts, colleagues, and peers across industries I consider the GPU Technology Conference to be the single best place to see the amazing work enabled by the GPU. It s a great venue for meeting researchers, developers, scientists, and entrepreneurs from around the world. -- Professor Hanspeter Pfister, Harvard University and GTC 2009 keynote speaker Opportunities Call for Submissions Sessions & posters Sponsors / Exhibitors Reach decision makers CEO on Stage Showcase for Startups Tell your story to VCs and analysts 45

46 Tesla GPU Computing Questions?

47 Tesla GPU Computing Products 47

48 NVIDIA Tesla GPU Computing Products Data Center Products Workstation Server Module 1U Systems Workstation Boards Tesla M2070 / Tesla M2050 Tesla M1060 Tesla S2050 Tesla S1070 Tesla C2070 / Tesla C2050 Tesla C1060 GPUs 1 T20 GPU 1 T10 GPU 4 T20 GPUs 4 T10 GPUs 1 T20 GPU 1 T10 GPU Single Precision 1030 GFlops 933 GFlops 4120 GFlops 4140 GFlops 1030 Gflops 933 GFlops Double Precision 515 Gflops 78 GFlops 2060 GFlops 346 GFlops 515 Gflops 78 GFlops Memory 6 GB / 3 GB 4 GB 12 GB (S2050) 16 GB 4 GB / GPU 6 GB / 3 GB 4 GB Mem BW GB/s 102 GB/s GB/s 102 GB/s 144 GB/s 102 GB/s 48

49 Announced OEM Servers w/ Tesla M-series GPUs 49

50 OEM Servers with Tesla M2050 GPUs 2 Tesla GPUs 4 Tesla GPUs 10 Tesla GPUs 8 Tesla GPUs SuperServer 2 CPUs + 2 GPUs in 1U Tetra 2 CPUs + 4 GPUs in 1U GreenBlade 10 CPUs + 10 GPUs in 5U B CPUs + 8 GPUs in 4U 50

51 OEM Servers with Tesla M1060 GPUs 2 Tesla M1060 GPUs Upto 18 Tesla M1060 GPUs 8 Tesla M1060 GPUs SuperServer 6016GT-TF 2 CPUs + 2 GPUs in 1U Cray CX1000 and Bull Bullx 36 CPUs + 18 GPUs in 7U Tyan B CPUs + 8 GPUs in 4U 51

52 In testing our key applications, the Tesla GPUs delivered speed-ups that we had never seen before, sometimes even orders of magnitude Satoshi Matsuoka Professor Tokyo Institute of Technology I believe history will record Fermi as a significant milestone. Dave Patterson Director Parallel Computing Research Laboratory, U.C. Berkeley Co-Author of Computer Architecture: A Quantitative Approach Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs Jack Dongarra Professor, University of Tennessee Author of Linpack 52

53 $1M Budget 53

54 4.5x Lower Power & Cooling Costs 37 TeraFlop System : Top 150 System 2 Racks of GPU+CPUs 7x Less Space Required 15 Racks of CPUs $740 K 5x Lower Cost $3.8 M $117 K 4.5x Power Savings every Year $524 K As per November 2009 Top 500 List 54

55 Power MegaWatts 25 Scaling to 5 PetaFlop Cluster Mwatt x86 CPU Jaguar x86 CPU 10 Mwatt Tesla GPU 5 0 JUGENE BlueGene Roadrunner Cell Nebulae Tesla GPU Linpack Performance (Teraflops) 55

56 Linpack Teraflops What if Every Supercomputer Had Fermi? GPUs 110 TeraFlops $2.2 M Top GPUs 55 TeraFlops $1.1 M Top GPUs 37 TeraFlops $740K Top Top 500 Supercomputers (Nov 2009) 56

57 Defense / Federal Agencies Software Available Opportunities Defense Contractors Federal Agencies Defense services Speedups 10x-50x GIS Manifold, PCI Geomatics, DigitalGlobe Signal Processing GPU VSIPL MATLAB GPU Plugin available UAV video analysis MotionDSP Ikena Virtual Prototyping RealityServer Surveillance, Cryptography 57

58 Targeting Multiple Platforms with CUDA NVCC NVIDIA CUDA Toolkit NVIDIA GPUs PTX Ocelot PTX to Multi-core CUDA C / C++ MCUDA CUDA to Multi-core Multi-Core CPUs SWAN CUDA to OpencL Other GPUs MCUDA: Ocelot: Swan: 58

59 Teraflops Soul of a Supercomputer 25 Linpack Teraflops per Rack Nehalem 4-core CPU Westmere 6-core CPU Tesla 10-series Tesla 20-series 59

60 The Tesla Advantage 60

61 Parallel Computing on All GPUs 100+ Million CUDA GPUs Deployed GeForce Entertainment Tesla TM High-Performance Computing Quadro Design & Creation 61

62 Tesla : Built for Professional Computing Features Quality & Warranty Support & Lifecycle Feature 4x Higher double precision (on 20-series) ECC only on Tesla & Quadro (on 20-series) Bi-directional PCI-E communication (Tesla has Dual DMA Engines, GeForce has only 1 DMA Engine) Larger memory for larger data sets 3GB and 6GB Products Cluster management software tools available on Tesla only TCC (Tesla Compute Cluster) driver supported for Windows OS only on Tesla. Integrated OEM workstations and servers Professional ISVs will certify CUDA applications only on Tesla 2 to 4 day Stress testing & memory burn-in for reliability. Added margin in memory and core clocks for added reliability. Manufactured & guaranteed by NVIDIA 3-year warranty from NVIDIA Enterprise support, higher priority for CUDA bugs and requests Benefits Higher Performance for scientific CUDA applications Data reliability inside the GPU and on DRAM memories Higher Performance for CUDA applications (by overlapping communication & computation) Higher performance on wide range of applications (medical, oil & gas, manufacturing, FEA, CAE) Needed for GPU monitoring and job scheduling in data center deployments Higher performance for CUDA applications due to lower kernel launch overhead. TCC adds support for RDP and Services Trusted, reliable systems built for Tesla products. Bug reproduction, support, feature requests for Tesla only. Built for 24/7 computing in data center and workstation environments. No changes in key components like GPU and memory without notice. Always the same clocks for known, reliable performance. Reliable, long life products Ability to influence CUDA and GPU roadmap. Get early access to features requests months availability + 6-month EOL notice Reliable product supply 62

63 Relative Performance Benefits of Larger Tesla Memory 2 Benefit of Larger Tesla Memory 1 GB GPU board Tesla C1060 with 4GB Larger GPU memory enables operating on larger data sets App runs out of memory with 1GB of DRAM Applications run out of memory Molecular dynamics Medical imaging Seismic processing 0.5 CFD Electro-magnetics Finite Element Analysis 0 Seismic Processing AMBER Molecular Dynamics Data Analytics Database search 63

64 Tesla Data Center Software Tools System management tools nv-smi Thermal, Power, and System monitoring software Integration of nv-smi with Ganglia system monitoring software IPMI support for Tesla M-based hybrid CPU-GPU servers Clustering software Platform Computing : Cluster Manager Penguin Computing : Scyld Cluster Corp : Rocks Roll Altair: PBS Works 64

65 Tesla Compute Cluster (TCC) Driver Available Enables Windows HPC on Tesla Tesla driver for Windows Windows 7, Server 2008 R2, Windows Vista, Server 2008 Tesla 8-series, 10-series and 20-series GPUs supported Works with CUDA C/C++, Fortran Does not support OpenCL, OpenGL and DirectX Enables the following features under Windows with CUDA RDP (Remote Desktop) Launch CUDA applications via Windows Services No Windows Timeout issues No penalty on launch overhead KVM-over-IP enabled (CPU Server on-board graphics chipset enabled) 65

66 Tesla C-series Display Features Why does Tesla C2050 have DVI out? Even researchers and engineers need to see their work! Tesla: High Performance Computing Performance Graphics Features ISV Certs / SW Feature Double Precision FP Perf Geometry Perf (Tris/Sec) Viewperf Perf (Geomean) Level 12-bit/30-bit Quad-Buffered Stereo Display SLI MultiOS Workstation ISV Certs Tesla ISV Certs Tesla C2050 Same as Quadro Much lower than Quadro Lower than Quadro Baseline Level No No Single DVI-I Dual Link No No Yes Quadro: Visual Computing 66

67 Increasing Number of CUDA Applications A l r e a d y Av a i l a b l e 2010 Q2 Q3 Q4 Tools & Libraries CULA LAPACK Library Thrust: C++ Template Lib CUDA C/C++, PGI Fortran Jacket: MATLAB Plugin Nsight Visual Studio IDE NPP Performance Primitives (NPP) Allinea Debugger Platform Cluster Management TotalView Debugger Bright Cluster Management Mathematica PGI Accelerator Enhancements CAPS HMPP Enhancements Mathworks MATLAB Oil & Gas Bio- Chemistry Seismic Analysis: ffa, HeadWave Seismic Analysis: Geostar AMBER, GROMACS, GROMOS, HOOMD, LAMMPS, NAMD, VMD Seismic City, Acceleware BigDFT, ABINIT, TeraChem Quantum Chem Code 1 Seismic Interpretation Other Popular MD code Reservoir Simulation 1 Quantum Chem Code 2 Reservoir Simulation 2 Bio- Informatics Hex Protein Docking CUDA-BLASTP, GPU-HMMER, MUMmerGPU, MEME, CUDA-EC Protein Docking Short-read seq analysis Video & Rendering Finance Fraunhofer JPEG2K NAG: RNGs OptiX Ray Tracing Engine NumeriX: CounterParty Risk mental ray with iray Scicomp SciFinance Main Concept Video Encoder Risk Analysis 1 Elemental Video Live Risk Analysis 2 3D CAD SW with iray Credit Risk Analysis ISV Trading Platform ISV CAE AutoDesk Moldflow OpenCurrent: CFD/PDE Library Moldex3D Acusim AcuSolve CFD Structural Mechanics ISV MSC MARC MSC Nastran Several CAE ISVs EDA Electro-magnetics: Agilent, CST, Remcom, SPEAG Agilent ADS Spice Simulator Verilog Simulator Lithography Products SPICE Simulator Released Product Announced Product Unannounced Product 67

68 Explosive Growth in CUDA Developers 200,000 CUDA Developer Downloads Teaching Parallel Programming using NVIDIA GPUs 68

69 NVIDIA s Leadership in GPU Computing Applications and Papers using NVIDIA GPUs 10 GPGPU Books using NVIDIA GPUs 69

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Timothy Lanfear, NVIDIA

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Timothy Lanfear, NVIDIA HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of

More information

HIGH-PERFORMANCE COMPUTING

HIGH-PERFORMANCE COMPUTING HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of

More information

NVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010

NVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Tesla 3 rd generation Full OEM coverage Ecosystem focus Value Propositions per segments Card System Module

More information

Accelerating High Performance Computing.

Accelerating High Performance Computing. Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational

More information

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Chris Butler NVIDIA

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Chris Butler NVIDIA HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Chris Butler NVIDIA Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of Chromatophores Chromatophore

More information

CUDA to Unleash Computational Power of GPU

CUDA to Unleash Computational Power of GPU CUDA to Unleash Computational Power of GPU DATE 2011 Iole Moccagatta and Elif Albuz NVIDIA Corporation (e-mail: iomoccagatta@nvidia.com, ealbuz@nvidia.com) Welcome! Goal: an Introduction to High Performance

More information

SUPERCOMPUTING AT 1/10 TH

SUPERCOMPUTING AT 1/10 TH SUPERCOMPUTING AT 1/10 TH THE COST Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of Chromatophores

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

The Future of GPU Computing

The Future of GPU Computing The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist

More information

GPU Computing Ecosystem

GPU Computing Ecosystem GPU Computing Ecosystem CUDA 5 Enterprise level GPU Development GPU Development Paths Libraries, Directives, Languages GPU Tools Tools, libraries and plug-ins for GPU codes Tesla K10 Kepler! Tesla K20

More information

CUDA Conference. Walter Mundt-Blum March 6th, 2008

CUDA Conference. Walter Mundt-Blum March 6th, 2008 CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions

More information

High Performance Computing. David B. Kirk, Chief Scientist, t NVIDIA

High Performance Computing. David B. Kirk, Chief Scientist, t NVIDIA High Performance Computing David B. Kirk, Chief Scientist, t NVIDIA A History of Innovation Invented the Graphics Processing Unit (GPU) Pioneered programmable shading Over 2000 patents* 1995 1999 2002

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13 Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science

More information

Opportunities for HPC in Industry. Stan Posey HPC Applications and Industry Development NVIDIA, Santa Clara, CA, USA

Opportunities for HPC in Industry. Stan Posey HPC Applications and Industry Development NVIDIA, Santa Clara, CA, USA GPU Computing Opportunities for HPC in Industry Stan Posey HPC Applications and Industry Development NVIDIA, Santa Clara, CA, USA Why GPU Computing? High Performance Objective of GPU Computing: HPC Efficiency

More information

Early 3D Graphics. NVIDIA Corporation Perspective study of a chalice Paolo Uccello, circa 1450

Early 3D Graphics. NVIDIA Corporation Perspective study of a chalice Paolo Uccello, circa 1450 Early 3D Graphics Perspective study of a chalice Paolo Uccello, circa 1450 Early Graphics Hardware Artist using a perspective machine Albrecht Dürer, 1525 Perspective study of a chalice Paolo Uccello,

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

GPU Programming. Ringberg Theorie Seminar 2010

GPU Programming. Ringberg Theorie Seminar 2010 or How to tremendously accelerate your code? Michael Kraus, Christian Konz Max-Planck-Institut für Plasmaphysik, Garching Ringberg Theorie Seminar 2010 Introduction? GPU? GPUs can do more than just render

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ

TESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Accelerating Data Warehousing Applications Using General Purpose GPUs

Accelerating Data Warehousing Applications Using General Purpose GPUs Accelerating Data Warehousing Applications Using General Purpose s Sponsors: Na%onal Science Founda%on, LogicBlox Inc., IBM, and NVIDIA The General Purpose is a many core co-processor 10s to 100s of cores

More information

VSC Users Day 2018 Start to GPU Ehsan Moravveji

VSC Users Day 2018 Start to GPU Ehsan Moravveji Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally

More information

Supercomputing with NVIDIA GPUs

Supercomputing with NVIDIA GPUs Supercomputing with NVIDIA GUs HCN Workshop, May, 2011 Axel Koehler- NVIDIA NVIDIA Introduction and HC Evolution of GUs ublic, based in Santa Clara, CA ~$4B revenue ~6000 employees Founded in 1999 with

More information

Parallel Computing: What has changed lately? David B. Kirk

Parallel Computing: What has changed lately? David B. Kirk Parallel Computing: What has changed lately? David B. Kirk Supercomputing 2007 Future Science and Engineering Breakthroughs Hinge on Computing Computational Geoscience Computational Chemistry Computational

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator

Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware

More information

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge

More information

ENABLING NEW SCIENCE GPU SOLUTIONS

ENABLING NEW SCIENCE GPU SOLUTIONS ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a

More information

CUDA Accelerated Compute Libraries. M. Naumov

CUDA Accelerated Compute Libraries. M. Naumov CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA

GPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA GPU Computing fuer rechenintensive Anwendungen Axel Koehler NVIDIA GeForce Quadro Tegra Tesla 2 Continued Demand for Ever Faster Supercomputers First-principles simulation of combustion for new high-efficiency,

More information

Lecture 1: Introduction and Computational Thinking

Lecture 1: Introduction and Computational Thinking PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational

More information

CS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra)

CS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra) CS 470 Spring 2016 Mike Lam, Professor Other Architectures (with an aside on linear algebra) Parallel Systems Shared memory (uniform global address space) Primary story: make faster computers Programming

More information

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016 RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD

More information

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel

More information

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images GPU Ray Tracing at the Desktop and in the Cloud Phillip Miller, NVIDIA Ludwig von Reiche, mental images Ray Tracing has always had an appeal Ray Tracing Prediction The future of interactive graphics is

More information

GPU on Cloud computing

GPU on Cloud computing 1 GPU on Cloud computing Heterogeneous Computing SC10 & ISC Updates GPU Computing NVIDIA Korea Professional Solution Group jslee@nvidia.com 2 .,, CPU (CPU). (GPU) 2D 3D. GPU. GPU /. FPGA (Field-programmable

More information

GPU Computing with NVIDIA s new Kepler Architecture

GPU Computing with NVIDIA s new Kepler Architecture GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,

More information

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA

Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

GPU Clouds IGT Cloud Computing Summit Mordechai Butrashvily, CEO 2009 (c) All rights reserved

GPU Clouds IGT Cloud Computing Summit Mordechai Butrashvily, CEO 2009 (c) All rights reserved GPU Clouds IGT 2009 Cloud Computing Summit Mordechai Butrashvily, CEO moti@hoopoe-cloud.com 02/12/2009 Agenda Introduction to GPU Computing Future GPU architecture GPU on a Cloud: Visualization Computing

More information

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA

Turing Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers. Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO 2009: The GPU Computing Tipping Point Jen-Hsun Huang, CEO Someday, our graphics chips will have 1 TeraFLOPS of computing power, will be used for playing games to discovering cures for cancer to streaming

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

GPU Lund Observatory

GPU Lund Observatory GPU Tutorial @ Lund Observatory Gernot Ziegler, NVIDIA UK HISTORY / INTRODUCTION Parallel vs Sequential Architecture Evolution ILLIAC IV Maspar Blue Gene Cray-1 Thinking Machines High Performance Computing

More information

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers. Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.

More information

The following NVIDIA accelerators are available from HP, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.

The following NVIDIA accelerators are available from HP, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers. Overview NVIDIA Accelerators for HPE ProLiant Servers Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort

PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class

More information

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important

More information

QUADRO ADVANCED VISUALIZATION. PREVAIL & PREVAIL ELITE SSDs PROFESSIONAL STORAGE -

QUADRO ADVANCED VISUALIZATION. PREVAIL & PREVAIL ELITE SSDs PROFESSIONAL STORAGE - PNY Professional Solutions QUADRO ADVANCED VISUALIZATION TESLA PARALLEL COMPUTING PREVAIL & PREVAIL ELITE SSDs PROFESSIONAL STORAGE NVS COMMERCIAL GRAPHICS Delivering the broadest range of Professional

More information

S8901 Quadro for AI, VR and Simulation

S8901 Quadro for AI, VR and Simulation S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more

More information

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA

CUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA PROGRAMMING MODEL Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA: COMMON UNIFIED DEVICE ARCHITECTURE Parallel computing architecture and programming model GPU Computing Application Includes

More information

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers. Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.

More information

Practical Introduction to CUDA and GPU

Practical Introduction to CUDA and GPU Practical Introduction to CUDA and GPU Charlie Tang Centre for Theoretical Neuroscience October 9, 2009 Overview CUDA - stands for Compute Unified Device Architecture Introduced Nov. 2006, a parallel computing

More information

Intel Performance Libraries

Intel Performance Libraries Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation

More information

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, MLseries and SL-series servers.

The following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, MLseries and SL-series servers. Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Advanced CUDA Programming. Dr. Timo Stich

Advanced CUDA Programming. Dr. Timo Stich Advanced CUDA Programming Dr. Timo Stich (tstich@nvidia.com) Outline SIMT Architecture, Warps Kernel optimizations Global memory throughput Launch configuration Shared memory access Instruction throughput

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

designing a GPU Computing Solution

designing a GPU Computing Solution designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained

More information

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research

Introduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research Introduction to CUDA CME343 / ME339 18 May 2011 James Balfour [ jbalfour@nvidia.com] NVIDIA Research CUDA Programing system for machines with GPUs Programming Language Compilers Runtime Environments Drivers

More information

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid

More information

ENDURING DIFFERENTIATION. Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded

More information

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION Timothy Lanfear ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf

More information

NVIDIA CUDA Libraries

NVIDIA CUDA Libraries NVIDIA CUDA Libraries Ujval Kapasi*, Elif Albuz*, Philippe Vandermersch*, Nathan Whitehead*, Frank Jargstorff* San Jose Convention Center Sept 22, 2010 *NVIDIA NVIDIA CUDA Libraries Applications 3 rd Party

More information

Advanced CUDA Optimization 1. Introduction

Advanced CUDA Optimization 1. Introduction Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Status and Directions of NVIDIA GPUs for Earth System Modeling

Status and Directions of NVIDIA GPUs for Earth System Modeling Status and Directions of NVIDIA GPUs for Earth System Modeling Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA 1 NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B

More information

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics

Technische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics GPU Programming Rüdiger Westermann Chair for Computer Graphics & Visualization Faculty of Informatics Overview Programming interfaces and support libraries The CUDA programming abstraction An in-depth

More information

Nvidia Tesla The Personal Supercomputer

Nvidia Tesla The Personal Supercomputer International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) Nvidia Tesla The Personal Supercomputer Sameer Ahmad 1, Umer Amin 2, Mr. Zubair M Paul 3 1 Student,

More information

HIGH-PERFORMANCE COMPUTING WITH CUDA AND TESLA GPUS

HIGH-PERFORMANCE COMPUTING WITH CUDA AND TESLA GPUS HIGH-PERFORMANCE COMPUTING WITH CUDA AND TESLA GPUS Timothy Lanfear, NVIDIA WHAT IS GPU COMPUTING? What is GPU Computing? x86 PCIe bus GPU Computing with CPU + GPU Heterogeneous Computing Low Latency or

More information

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

Speedup Altair RADIOSS Solvers Using NVIDIA GPU Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair

More information

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start

More information

Hybrid Architectures Why Should I Bother?

Hybrid Architectures Why Should I Bother? Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,

More information

CS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra)

CS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra) CS 470 Spring 2018 Mike Lam, Professor Other Architectures (with an aside on linear algebra) Aside (P3 related): linear algebra Many scientific phenomena can be modeled as matrix operations Differential

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS

SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence

More information

MAGMA. Matrix Algebra on GPU and Multicore Architectures

MAGMA. Matrix Algebra on GPU and Multicore Architectures MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/

More information

GPU Computing with CUDA. Part 2: CUDA Introduction

GPU Computing with CUDA. Part 2: CUDA Introduction GPU Computing with CUDA Part 2: CUDA Introduction Dortmund, June 4, 2009 SFB 708, AK "Modellierung und Simulation" Dominik Göddeke Angewandte Mathematik und Numerik TU Dortmund dominik.goeddeke@math.tu-dortmund.de

More information

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation

GPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO

More information

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU. Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8

More information