Supercomputing at 1/10 th the Cost.
|
|
- Kelley Sutton
- 6 years ago
- Views:
Transcription
1 Supercomputing at 1/10 th the Cost
2 GPU Computing * CPU + GPU Co-Processing 4 cores CPU 48 GigaFlops (DP) GPU 515 GigaFlops (DP) * aka GPGPU 2
3 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois, Urbana Video Transcoding Elemental Tech Matlab Computing AccelerEyes Astrophysics RIKEN 15x 150x 149X 47X 20X 130X 30X Financial simulation Oxford Linear Algebra Universidad Jaime 3D Ultrasound Techniscan Quantum Chemistry U of Illinois, Urbana Gene Sequencing U of Maryland 3
4 Tesla Data Center & Workstation GPU Solutions Tesla M-series GPUs M2070 M2050 M1060 M-series M-series Example: Nitro G5 1U Tesla S-series 1U Systems S2050 S1070 S-series 1U System Tesla C-series GPUs C2070 C2050 C1060 Integrated CPU-GPU Servers & Blades OEM CPU Server + Tesla S-series 1U Workstations 2 to 4 Tesla GPUs 4
5 DRAM I/F Giga Thread HOST I/F DRAM I/F Fermi: The Computational GPU Performance 7x Double Precision of CPUs IEEE SP & DP Floating Point Flexibility Increased Shared Memory from 16 KB to 48 KB Added L1 and L2 Caches ECC on all Internal and External Memories Enable up to 1 TeraByte of GPU Memories High Speed GDDR5 Memory Interface L2 DRAM I/F DRAM I/F DRAM I/F Usability Multiple Simultaneous Tasks on GPU 10x Faster Atomic Operations C++ Support System Calls, printf support DRAM I/F 5
6 NVIDIA Tesla GPU Computing Products Data Center Products Workstation Server Module 1U Systems Workstation Boards Tesla M2070 / Tesla M2050 Tesla M1060 Tesla S2050 Tesla S1070 Tesla C2070 / Tesla C2050 Tesla C1060 GPUs 1 T20 GPU 1 T10 GPU 4 T20 GPUs 4 T10 GPUs 1 T20 GPU 1 T10 GPU Single Precision 1030 GFlops 933 GFlops 4120 GFlops 4140 GFlops 1030 Gflops 933 GFlops Double Precision 515 Gflops 78 GFlops 2060 GFlops 346 GFlops 515 Gflops 78 GFlops Memory 6 GB / 3 GB 4 GB 12 GB (S2050) 16 GB 4 GB / GPU 6 GB / 3 GB 4 GB Mem BW GB/s 102 GB/s GB/s 102 GB/s 144 GB/s 102 GB/s 6
7 NVIDIA GPGPU Developer Eco-System Numerical Packages MATLAB Mathematica NI LabView pycuda Debuggers & Profilers cuda-gdb NV Visual Profiler Parallel Nsight Visual Studio Allinea TotalView GPU Compilers C C++ Fortran OpenCL DirectCompute Java Python Parallelizing Compilers PGI Accelerator CAPS HMPP mcuda OpenMP Libraries BLAS FFT LAPACK NPP Video Imaging GPULib GPGPU Consultants & Training OEM Solution Providers ANEO GPU Tech 7
8 World s Fastest GPU Supercomputer Petaflops Top500: #1 in Asia, #2 in the World 8
9 #2: Dawning Nebulae 1.27 Petaflop 4640 GPUs 2.55 MWatts 9
10 Power MegaWatts 8 2x Performance / Watt Jaguar x86 CPU 4 3 Nebulae Tesla GPU 2 1 IPE, CAS Tesla GPU JUGENE BlueGene Roadrunner Cell Linpack Performance (Teraflops) 10
11 8x Higher Linpack Performance Gflops Performance / $ Gflops / $K Performance / watt Gflops / kwatt CPU Server GPU-CPU Server 0 CPU Server GPU-CPU Server 0 CPU Server GPU-CPU Server CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw 11
12 20+ Oil & Gas Companies Porting to CUDA Successful Customers GPU vs CPU Improvements Oil & Gas ISVs Performance / Watt 18x - 27x 12x - 17x Performance / Space 20x - 31x 15x - 20x Performance / Cost 15x - 20x 10x - 12x 12
13 Oil & Gas: Seismic Processing 1 Equal Performance 1 32 Tesla S1070s 31x Less Space 2000 CPU Servers 20x Lower Cost ~$400 K ~$8 M 45 kwatts 27x Lower Power 1200 kwatts 13
14 Finance: 10+ Banks Porting to CUDA Successful Customers Derivative Pricing using SciFinance Basket Equity-Linked Structured Note 2 Tesla C1060s 0.25 secs 124x Several unannounced Finance ISVs 1 Tesla C secs 77x Intel Xeon (2.6 GHz) secs Time (secs) UnRisk 14
15 Bloomberg: Bond Pricing 48 GPUs 42x Lower Space 2000 CPUs $144K 28x Lower Cost $4 Million $31K / year 38x Lower Power Cost $1.2 Million / year 15
16 Tesla Bio WorkBench : Bio-Chemistry & Bio-Informatics TeraChem Hex (Docking) Applications LAMMPS CUDA-BLASTP CUDA-MEME MUMmerGPU CUDA-EC Community Download, Documentation Technical papers Discussion Forums Benchmarks & Configurations Tesla Personal Supercomputer Tesla GPU Clusters Platforms 16
17 Finding a Better Shampoo Tesla PSC Equal Performance 32 CPU Servers No Data Center Required 1 kwatt 21 kwatts 13x Lower System Cost $7 K $128 K 19x Lower Power & Cooling Cost $2 K $37 K 17
18 The Tesla Advantage 18
19 Parallel Computing on All GPUs 100+ Million CUDA GPUs Deployed GeForce Entertainment Tesla TM High-Performance Computing Quadro Design & Creation 19
20 Tesla : Built for Professional Computing Features Quality & Warranty Support & Lifecycle Feature 4x Higher double precision (on 20-series) ECC only on Tesla & Quadro (on 20-series) Bi-directional PCI-E communication (Tesla has Dual DMA Engines, GeForce has only 1 DMA Engine) Larger memory for larger data sets 3GB and 6GB Products Cluster management software tools available on Tesla only TCC (Tesla Compute Cluster) driver supported for Windows OS only on Tesla. Integrated OEM workstations and servers Professional ISVs will certify CUDA applications only on Tesla 2 to 4 day Stress testing & memory burn-in for reliability. Added margin in memory and core clocks for added reliability. Manufactured & guaranteed by NVIDIA 3-year warranty from NVIDIA Enterprise support, higher priority for CUDA bugs and requests Benefits Higher Performance for scientific CUDA applications Data reliability inside the GPU and on DRAM memories Higher Performance for CUDA applications (by overlapping communication & computation) Higher performance on wide range of applications (medical, oil & gas, manufacturing, FEA, CAE) Needed for GPU monitoring and job scheduling in data center deployments Higher performance for CUDA applications due to lower kernel launch overhead. TCC adds support for RDP and Services Trusted, reliable systems built for Tesla products. Bug reproduction, support, feature requests for Tesla only. Built for 24/7 computing in data center and workstation environments. No changes in key components like GPU and memory without notice. Always the same clocks for known, reliable performance. Reliable, long life products Ability to influence CUDA and GPU roadmap. Get early access to features requests months availability + 6-month EOL notice Reliable product supply 20
21 Programming GPUs 21
22 Doing GPGPU Right: Combination of Hardware and Software GPU Computing Applications Libraries and Middleware cufft cublas CULA LAPACK NPP & cudpp Video PhysX Physics OptiX Ray Tracing mental ray iray Rendering Reality Server 3D Web Services C++ tm Direct C OpenCL Compute Fortran Java and Python NVIDIA GPU CUDA Parallel Computing Architecture 22 OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.
23 CUDA C/C++: C with a few new keywords void saxpy_serial(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } // Invoke serial SAXPY kernel saxpy_serial(n, 2.0, x, y); Standard C Code global void saxpy_parallel(int n, float a, float *x, float *y) { int i = blockidx.x*blockdim.x + threadidx.x; if (i < n) y[i] = a*x[i] + y[i]; } // Invoke parallel SAXPY kernel with 256 threads/block int nblocks = (n + 255) / 256; saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y); Parallel C Code 23
24 CUDA Programming Effort / Performance Source : MIT CUDA Course 24
25 Fermi NEXT GENERATION ARCHITECTURE 25
26 Introducing the Fermi Architecture 3 billion transistors 512 cores (max) DP performance 50% of SP ECC L1 and L2 Caches GDDR5 Memory Up to 1 Terabyte of GPU Memory Concurrent Kernels, C++ 26
27 Fermi SM Architecture 32 CUDA cores per SM (512 total) Double precision 50% of single precision 8x over GT200 Dual Thread Scheduler 64 KB of RAM for shared memory and L1 cache (configurable) 27
28 CUDA Core Architecture New IEEE floating-point standard, surpassing even the most advanced CPUs Fused multiply-add (FMA) instruction for both single and double precision Newly designed integer ALU optimized for 64-bit and extended precision operations 28
29 Cached Memory Hierarchy First GPU architecture to support a true cache hierarchy in combination with on-chip shared memory L1 Cache per SM (per 32 cores) Improves bandwidth and reduces latency Unified L2 Cache (768 KB) Fast, coherent data sharing across all cores in the GPU Parallel DataCache Memory Hierarchy 29
30 ECC ECC protection for DRAM ECC supported for GDDR5 memory All major internal memories Register file, shared memory, L1 cache, L2 cache Detect 2-bit errors, correct 1-bit errors (per word) 30
31 GigaThread Hardware Thread Scheduler Hierarchically manages thousands of simultaneously active threads 10x faster application context switching Concurrent kernel execution 31
32 GigaThread Hardware Thread Scheduler Concurrent Kernel Execution + Faster Context Switch Serial Kernel Execution Parallel Kernel Execution 32
33 GigaThread Streaming Data Transfer Engine Dual DMA engines Simultaneous CPU GPU and GPU CPU data transfer Fully overlapped with CPU and GPU processing time Activity Snapshot: 33
34 Enhanced Software Support Full C++ Support Virtual functions Try/Catch hardware support System call support Support for pipes, semaphores, printf, etc Unified 64-bit memory addressing 34
35 Performance Benchmarks 35
36 8x Higher Linpack CPU Server 8x 5x 4.5x Performance Gflops GPU-CPU Server Performance / $ Gflops / $K 11 CPU Server 60 GPU-CPU Server Performance / watt Gflops / kwatt CPU Server GPU-CPU Server CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kw GPU-CPU 1U Server: 2x Tesla C x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw 36
37 Speed-up Speed-up Speed-up Speed-up Speed-up Performance Summary MIDG: Discontinuous Galerkin Solvers for PDEs AMBER Molecular Dynamics (Mixed Precision) 18 Lock Exchange Problem OpenCurrent 180 OpenEye ROCS Virtual Drug Screening 8 Radix Sort CUDA SDK Preliminary data 37
38 Standard FFT Library: cufft 3.1 Gflops 400 Single Precision FFT Gflops 120 Double Precision FFT Transform Size Transform Size cufft 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 38
39 Standard BLAS Library: cublas 3.1 Gflops Single Precision BLAS: SGEMM 256x x x x x x8192 Gflops Double Precision BLAS: DGEMM 256x x x x x x8192 Matrix Size Matrix Size cublas 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 39
40 80 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 8112 Matrix Size for Best cublas Performance Gflops SGEMM: Multiples of Gflops DGEMM: Multiples of cublas 3.1: NVIDIA Tesla C1060, Tesla C2050 (Fermi) MKL : Quad-Core Intel Xeon 5550, 2.67 GHz 40
41 GLOPS GLOPS GLOPS CULA 1.3 LAPACK Library from EM Photonics QR Decomposition (DGEQRF) Householder method; Operation count estimated as 1.33N 3 LU Decomposition (DGETRF) Partial pivoting; Operation count estimated as 0.66N 3 Singular Value Decomposition (DGESVD) Left & right singular vectors; Operation count estimated as 21N Matrix Size Matrix Size Matrix Size Core i7-920 Tesla C1060 Tesla C2050 Core i7-920 Tesla C1060 Tesla C2050 Core i7-920 Tesla C1060 Tesla C2050 Double Precision Results Data Courtesy: EM Photonics 41
42 Sparse Matrix-Vector Multiplication (SpMV) Gflops Single Precision Tesla C2050 (ECC off) Tesla C2050 (ECC on) Tesla C1060 Intel Xeon 5550 Gflops Double Precision Tesla C2050 (ECC off) Tesla C2050 (ECC on) Tesla C1060 Intel Xeon SpMv: CUDA 3.0, Tesla C1060 and Tesla C2050 MKL 10.2: Intel Xeon 5550, 2.67 GHz Preliminary data 42
43 Australia GPU Users Groups Informal local special interest groups bring together GPU users from all fields and experience levels Groups in Brisbane, Sydney, Perth, and soon Melbourne First meetings held June, 2010 Second meetings likely in August Also look for the Australia GPU Users group on linkedin.com 43
44 NVIDIA Academic Partnership Support faculty research & teaching efforts Small equipment gifts (1-2 GPUs) Significant discounts on GPU purchases Easy Especially Quadro, Tesla equipment Useful for cost matching Research contracts Small cash grants (typically ~$25K gifts) Medium-scale equipment donations (10-30 GPUs) Informal proposals, reviewed quarterly Focus areas: GPU computing, especially with an educational mission or component NVIDIA PhD Fellowship: accepting apps in November! Competitive 44
45 GPU Technology Conference 2010 Monday, Sept Thurs., Sept. 23, 2010 San Jose Convention Center, San Jose, California The most important event in the GPU ecosystem Learn about seismic shifts in GPU computing Preview disruptive technologies and emerging applications Get tools and techniques to impact mission critical projects Network with experts, colleagues, and peers across industries I consider the GPU Technology Conference to be the single best place to see the amazing work enabled by the GPU. It s a great venue for meeting researchers, developers, scientists, and entrepreneurs from around the world. -- Professor Hanspeter Pfister, Harvard University and GTC 2009 keynote speaker Opportunities Call for Submissions Sessions & posters Sponsors / Exhibitors Reach decision makers CEO on Stage Showcase for Startups Tell your story to VCs and analysts 45
46 Tesla GPU Computing Questions?
47 Tesla GPU Computing Products 47
48 NVIDIA Tesla GPU Computing Products Data Center Products Workstation Server Module 1U Systems Workstation Boards Tesla M2070 / Tesla M2050 Tesla M1060 Tesla S2050 Tesla S1070 Tesla C2070 / Tesla C2050 Tesla C1060 GPUs 1 T20 GPU 1 T10 GPU 4 T20 GPUs 4 T10 GPUs 1 T20 GPU 1 T10 GPU Single Precision 1030 GFlops 933 GFlops 4120 GFlops 4140 GFlops 1030 Gflops 933 GFlops Double Precision 515 Gflops 78 GFlops 2060 GFlops 346 GFlops 515 Gflops 78 GFlops Memory 6 GB / 3 GB 4 GB 12 GB (S2050) 16 GB 4 GB / GPU 6 GB / 3 GB 4 GB Mem BW GB/s 102 GB/s GB/s 102 GB/s 144 GB/s 102 GB/s 48
49 Announced OEM Servers w/ Tesla M-series GPUs 49
50 OEM Servers with Tesla M2050 GPUs 2 Tesla GPUs 4 Tesla GPUs 10 Tesla GPUs 8 Tesla GPUs SuperServer 2 CPUs + 2 GPUs in 1U Tetra 2 CPUs + 4 GPUs in 1U GreenBlade 10 CPUs + 10 GPUs in 5U B CPUs + 8 GPUs in 4U 50
51 OEM Servers with Tesla M1060 GPUs 2 Tesla M1060 GPUs Upto 18 Tesla M1060 GPUs 8 Tesla M1060 GPUs SuperServer 6016GT-TF 2 CPUs + 2 GPUs in 1U Cray CX1000 and Bull Bullx 36 CPUs + 18 GPUs in 7U Tyan B CPUs + 8 GPUs in 4U 51
52 In testing our key applications, the Tesla GPUs delivered speed-ups that we had never seen before, sometimes even orders of magnitude Satoshi Matsuoka Professor Tokyo Institute of Technology I believe history will record Fermi as a significant milestone. Dave Patterson Director Parallel Computing Research Laboratory, U.C. Berkeley Co-Author of Computer Architecture: A Quantitative Approach Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs Jack Dongarra Professor, University of Tennessee Author of Linpack 52
53 $1M Budget 53
54 4.5x Lower Power & Cooling Costs 37 TeraFlop System : Top 150 System 2 Racks of GPU+CPUs 7x Less Space Required 15 Racks of CPUs $740 K 5x Lower Cost $3.8 M $117 K 4.5x Power Savings every Year $524 K As per November 2009 Top 500 List 54
55 Power MegaWatts 25 Scaling to 5 PetaFlop Cluster Mwatt x86 CPU Jaguar x86 CPU 10 Mwatt Tesla GPU 5 0 JUGENE BlueGene Roadrunner Cell Nebulae Tesla GPU Linpack Performance (Teraflops) 55
56 Linpack Teraflops What if Every Supercomputer Had Fermi? GPUs 110 TeraFlops $2.2 M Top GPUs 55 TeraFlops $1.1 M Top GPUs 37 TeraFlops $740K Top Top 500 Supercomputers (Nov 2009) 56
57 Defense / Federal Agencies Software Available Opportunities Defense Contractors Federal Agencies Defense services Speedups 10x-50x GIS Manifold, PCI Geomatics, DigitalGlobe Signal Processing GPU VSIPL MATLAB GPU Plugin available UAV video analysis MotionDSP Ikena Virtual Prototyping RealityServer Surveillance, Cryptography 57
58 Targeting Multiple Platforms with CUDA NVCC NVIDIA CUDA Toolkit NVIDIA GPUs PTX Ocelot PTX to Multi-core CUDA C / C++ MCUDA CUDA to Multi-core Multi-Core CPUs SWAN CUDA to OpencL Other GPUs MCUDA: Ocelot: Swan: 58
59 Teraflops Soul of a Supercomputer 25 Linpack Teraflops per Rack Nehalem 4-core CPU Westmere 6-core CPU Tesla 10-series Tesla 20-series 59
60 The Tesla Advantage 60
61 Parallel Computing on All GPUs 100+ Million CUDA GPUs Deployed GeForce Entertainment Tesla TM High-Performance Computing Quadro Design & Creation 61
62 Tesla : Built for Professional Computing Features Quality & Warranty Support & Lifecycle Feature 4x Higher double precision (on 20-series) ECC only on Tesla & Quadro (on 20-series) Bi-directional PCI-E communication (Tesla has Dual DMA Engines, GeForce has only 1 DMA Engine) Larger memory for larger data sets 3GB and 6GB Products Cluster management software tools available on Tesla only TCC (Tesla Compute Cluster) driver supported for Windows OS only on Tesla. Integrated OEM workstations and servers Professional ISVs will certify CUDA applications only on Tesla 2 to 4 day Stress testing & memory burn-in for reliability. Added margin in memory and core clocks for added reliability. Manufactured & guaranteed by NVIDIA 3-year warranty from NVIDIA Enterprise support, higher priority for CUDA bugs and requests Benefits Higher Performance for scientific CUDA applications Data reliability inside the GPU and on DRAM memories Higher Performance for CUDA applications (by overlapping communication & computation) Higher performance on wide range of applications (medical, oil & gas, manufacturing, FEA, CAE) Needed for GPU monitoring and job scheduling in data center deployments Higher performance for CUDA applications due to lower kernel launch overhead. TCC adds support for RDP and Services Trusted, reliable systems built for Tesla products. Bug reproduction, support, feature requests for Tesla only. Built for 24/7 computing in data center and workstation environments. No changes in key components like GPU and memory without notice. Always the same clocks for known, reliable performance. Reliable, long life products Ability to influence CUDA and GPU roadmap. Get early access to features requests months availability + 6-month EOL notice Reliable product supply 62
63 Relative Performance Benefits of Larger Tesla Memory 2 Benefit of Larger Tesla Memory 1 GB GPU board Tesla C1060 with 4GB Larger GPU memory enables operating on larger data sets App runs out of memory with 1GB of DRAM Applications run out of memory Molecular dynamics Medical imaging Seismic processing 0.5 CFD Electro-magnetics Finite Element Analysis 0 Seismic Processing AMBER Molecular Dynamics Data Analytics Database search 63
64 Tesla Data Center Software Tools System management tools nv-smi Thermal, Power, and System monitoring software Integration of nv-smi with Ganglia system monitoring software IPMI support for Tesla M-based hybrid CPU-GPU servers Clustering software Platform Computing : Cluster Manager Penguin Computing : Scyld Cluster Corp : Rocks Roll Altair: PBS Works 64
65 Tesla Compute Cluster (TCC) Driver Available Enables Windows HPC on Tesla Tesla driver for Windows Windows 7, Server 2008 R2, Windows Vista, Server 2008 Tesla 8-series, 10-series and 20-series GPUs supported Works with CUDA C/C++, Fortran Does not support OpenCL, OpenGL and DirectX Enables the following features under Windows with CUDA RDP (Remote Desktop) Launch CUDA applications via Windows Services No Windows Timeout issues No penalty on launch overhead KVM-over-IP enabled (CPU Server on-board graphics chipset enabled) 65
66 Tesla C-series Display Features Why does Tesla C2050 have DVI out? Even researchers and engineers need to see their work! Tesla: High Performance Computing Performance Graphics Features ISV Certs / SW Feature Double Precision FP Perf Geometry Perf (Tris/Sec) Viewperf Perf (Geomean) Level 12-bit/30-bit Quad-Buffered Stereo Display SLI MultiOS Workstation ISV Certs Tesla ISV Certs Tesla C2050 Same as Quadro Much lower than Quadro Lower than Quadro Baseline Level No No Single DVI-I Dual Link No No Yes Quadro: Visual Computing 66
67 Increasing Number of CUDA Applications A l r e a d y Av a i l a b l e 2010 Q2 Q3 Q4 Tools & Libraries CULA LAPACK Library Thrust: C++ Template Lib CUDA C/C++, PGI Fortran Jacket: MATLAB Plugin Nsight Visual Studio IDE NPP Performance Primitives (NPP) Allinea Debugger Platform Cluster Management TotalView Debugger Bright Cluster Management Mathematica PGI Accelerator Enhancements CAPS HMPP Enhancements Mathworks MATLAB Oil & Gas Bio- Chemistry Seismic Analysis: ffa, HeadWave Seismic Analysis: Geostar AMBER, GROMACS, GROMOS, HOOMD, LAMMPS, NAMD, VMD Seismic City, Acceleware BigDFT, ABINIT, TeraChem Quantum Chem Code 1 Seismic Interpretation Other Popular MD code Reservoir Simulation 1 Quantum Chem Code 2 Reservoir Simulation 2 Bio- Informatics Hex Protein Docking CUDA-BLASTP, GPU-HMMER, MUMmerGPU, MEME, CUDA-EC Protein Docking Short-read seq analysis Video & Rendering Finance Fraunhofer JPEG2K NAG: RNGs OptiX Ray Tracing Engine NumeriX: CounterParty Risk mental ray with iray Scicomp SciFinance Main Concept Video Encoder Risk Analysis 1 Elemental Video Live Risk Analysis 2 3D CAD SW with iray Credit Risk Analysis ISV Trading Platform ISV CAE AutoDesk Moldflow OpenCurrent: CFD/PDE Library Moldex3D Acusim AcuSolve CFD Structural Mechanics ISV MSC MARC MSC Nastran Several CAE ISVs EDA Electro-magnetics: Agilent, CST, Remcom, SPEAG Agilent ADS Spice Simulator Verilog Simulator Lithography Products SPICE Simulator Released Product Announced Product Unannounced Product 67
68 Explosive Growth in CUDA Developers 200,000 CUDA Developer Downloads Teaching Parallel Programming using NVIDIA GPUs 68
69 NVIDIA s Leadership in GPU Computing Applications and Papers using NVIDIA GPUs 10 GPGPU Books using NVIDIA GPUs 69
Tesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory
More informationTesla GPU Computing A Revolution in High Performance Computing
Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction
More informationHIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Timothy Lanfear, NVIDIA
HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of
More informationHIGH-PERFORMANCE COMPUTING
HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of
More informationNVIDIA GPU Computing Séminaire Calcul Hybride Aristote 25 Mars 2010
NVIDIA GPU Computing 2010 Séminaire Calcul Hybride Aristote 25 Mars 2010 NVIDIA GPU Computing 2010 Tesla 3 rd generation Full OEM coverage Ecosystem focus Value Propositions per segments Card System Module
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationHIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Chris Butler NVIDIA
HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Chris Butler NVIDIA Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of Chromatophores Chromatophore
More informationCUDA to Unleash Computational Power of GPU
CUDA to Unleash Computational Power of GPU DATE 2011 Iole Moccagatta and Elif Albuz NVIDIA Corporation (e-mail: iomoccagatta@nvidia.com, ealbuz@nvidia.com) Welcome! Goal: an Introduction to High Performance
More informationSUPERCOMPUTING AT 1/10 TH
SUPERCOMPUTING AT 1/10 TH THE COST Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of Chromatophores
More informationG P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G
Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty
More informationTechnology for a better society. hetcomp.com
Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction
More informationThe Future of GPU Computing
The Future of GPU Computing Bill Dally Chief Scientist & Sr. VP of Research, NVIDIA Bell Professor of Engineering, Stanford University November 18, 2009 The Future of Computing Bill Dally Chief Scientist
More informationGPU Computing Ecosystem
GPU Computing Ecosystem CUDA 5 Enterprise level GPU Development GPU Development Paths Libraries, Directives, Languages GPU Tools Tools, libraries and plug-ins for GPU codes Tesla K10 Kepler! Tesla K20
More informationCUDA Conference. Walter Mundt-Blum March 6th, 2008
CUDA Conference Walter Mundt-Blum March 6th, 2008 NVIDIA s Businesses Multiple Growth Engines GPU Graphics Processing Units MCP Media and Communications Processors PESG Professional Embedded & Solutions
More informationHigh Performance Computing. David B. Kirk, Chief Scientist, t NVIDIA
High Performance Computing David B. Kirk, Chief Scientist, t NVIDIA A History of Innovation Invented the Graphics Processing Unit (GPU) Pioneered programmable shading Over 2000 patents* 1995 1999 2002
More informationGeneral Purpose GPU Computing in Partial Wave Analysis
JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data
More informationIntroduction to GPU Computing. 周国峰 Wuhan University 2017/10/13
Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science
More informationOpportunities for HPC in Industry. Stan Posey HPC Applications and Industry Development NVIDIA, Santa Clara, CA, USA
GPU Computing Opportunities for HPC in Industry Stan Posey HPC Applications and Industry Development NVIDIA, Santa Clara, CA, USA Why GPU Computing? High Performance Objective of GPU Computing: HPC Efficiency
More informationEarly 3D Graphics. NVIDIA Corporation Perspective study of a chalice Paolo Uccello, circa 1450
Early 3D Graphics Perspective study of a chalice Paolo Uccello, circa 1450 Early Graphics Hardware Artist using a perspective machine Albrecht Dürer, 1525 Perspective study of a chalice Paolo Uccello,
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationGPU Programming. Ringberg Theorie Seminar 2010
or How to tremendously accelerate your code? Michael Kraus, Christian Konz Max-Planck-Institut für Plasmaphysik, Garching Ringberg Theorie Seminar 2010 Introduction? GPU? GPUs can do more than just render
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationTESLA ACCELERATED COMPUTING. Mike Wang Solutions Architect NVIDIA Australia & NZ
TESLA ACCELERATED COMPUTING Mike Wang Solutions Architect NVIDIA Australia & NZ mikewang@nvidia.com GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationGPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC
GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationAccelerating Data Warehousing Applications Using General Purpose GPUs
Accelerating Data Warehousing Applications Using General Purpose s Sponsors: Na%onal Science Founda%on, LogicBlox Inc., IBM, and NVIDIA The General Purpose is a many core co-processor 10s to 100s of cores
More informationVSC Users Day 2018 Start to GPU Ehsan Moravveji
Outline A brief intro Available GPUs at VSC GPU architecture Benchmarking tests General Purpose GPU Programming Models VSC Users Day 2018 Start to GPU Ehsan Moravveji Image courtesy of Nvidia.com Generally
More informationSupercomputing with NVIDIA GPUs
Supercomputing with NVIDIA GUs HCN Workshop, May, 2011 Axel Koehler- NVIDIA NVIDIA Introduction and HC Evolution of GUs ublic, based in Santa Clara, CA ~$4B revenue ~6000 employees Founded in 1999 with
More informationParallel Computing: What has changed lately? David B. Kirk
Parallel Computing: What has changed lately? David B. Kirk Supercomputing 2007 Future Science and Engineering Breakthroughs Hinge on Computing Computational Geoscience Computational Chemistry Computational
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationIntroduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator
Introduction to CUDA C/C++ Mark Ebersole, NVIDIA CUDA Educator What is CUDA? Programming language? Compiler? Classic car? Beer? Coffee? CUDA Parallel Computing Platform www.nvidia.com/getcuda Programming
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville COSC 594 Lecture Notes March 22, 2017 1/20 Outline Introduction - Hardware
More informationCUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation
CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationHybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS
+ Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationENABLING NEW SCIENCE GPU SOLUTIONS
ENABLING NEW SCIENCE TESLA BIO Workbench The NVIDIA Tesla Bio Workbench enables biophysicists and computational chemists to push the boundaries of life sciences research. It turns a standard PC into a
More informationCUDA Accelerated Compute Libraries. M. Naumov
CUDA Accelerated Compute Libraries M. Naumov Outline Motivation Why should you use libraries? CUDA Toolkit Libraries Overview of performance CUDA Proprietary Libraries Address specific markets Third Party
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationGPU Computing fuer rechenintensive Anwendungen. Axel Koehler NVIDIA
GPU Computing fuer rechenintensive Anwendungen Axel Koehler NVIDIA GeForce Quadro Tegra Tesla 2 Continued Demand for Ever Faster Supercomputers First-principles simulation of combustion for new high-efficiency,
More informationLecture 1: Introduction and Computational Thinking
PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational
More informationCS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra)
CS 470 Spring 2016 Mike Lam, Professor Other Architectures (with an aside on linear algebra) Parallel Systems Shared memory (uniform global address space) Primary story: make faster computers Programming
More informationRECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016
RECENT TRENDS IN GPU ARCHITECTURES Perspectives of GPU computing in Science, 26 th Sept 2016 NVIDIA THE AI COMPUTING COMPANY GPU Computing Computer Graphics Artificial Intelligence 2 NVIDIA POWERS WORLD
More informationMAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures
MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures Stan Tomov Innovative Computing Laboratory University of Tennessee, Knoxville OLCF Seminar Series, ORNL June 16, 2010
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin
EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel
More informationGPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images
GPU Ray Tracing at the Desktop and in the Cloud Phillip Miller, NVIDIA Ludwig von Reiche, mental images Ray Tracing has always had an appeal Ray Tracing Prediction The future of interactive graphics is
More informationGPU on Cloud computing
1 GPU on Cloud computing Heterogeneous Computing SC10 & ISC Updates GPU Computing NVIDIA Korea Professional Solution Group jslee@nvidia.com 2 .,, CPU (CPU). (GPU) 2D 3D. GPU. GPU /. FPGA (Field-programmable
More informationGPU Computing with NVIDIA s new Kepler Architecture
GPU Computing with NVIDIA s new Kepler Architecture Axel Koehler Sr. Solution Architect HPC HPC Advisory Council Meeting, March 13-15 2013, Lugano 1 NVIDIA: Parallel Computing Company GPUs: GeForce, Quadro,
More informationStan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA
Stan Posey, CAE Industry Development NVIDIA, Santa Clara, CA, USA NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B revenue ~5,500 employees Founded in 1999 with primary business in
More informationWorld s most advanced data center accelerator for PCIe-based servers
NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationGPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA
GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles
More informationGPU Clouds IGT Cloud Computing Summit Mordechai Butrashvily, CEO 2009 (c) All rights reserved
GPU Clouds IGT 2009 Cloud Computing Summit Mordechai Butrashvily, CEO moti@hoopoe-cloud.com 02/12/2009 Agenda Introduction to GPU Computing Future GPU architecture GPU on a Cloud: Visualization Computing
More informationTuring Architecture and CUDA 10 New Features. Minseok Lee, Developer Technology Engineer, NVIDIA
Turing Architecture and CUDA 10 New Features Minseok Lee, Developer Technology Engineer, NVIDIA Turing Architecture New SM Architecture Multi-Precision Tensor Core RT Core Turing MPS Inference Accelerated,
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationWhat is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms
CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D
More informationThe following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.
Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware
More information2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO
2009: The GPU Computing Tipping Point Jen-Hsun Huang, CEO Someday, our graphics chips will have 1 TeraFLOPS of computing power, will be used for playing games to discovering cures for cancer to streaming
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationGPU Lund Observatory
GPU Tutorial @ Lund Observatory Gernot Ziegler, NVIDIA UK HISTORY / INTRODUCTION Parallel vs Sequential Architecture Evolution ILLIAC IV Maspar Blue Gene Cray-1 Thinking Machines High Performance Computing
More informationThe following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.
Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.
More informationThe following NVIDIA accelerators are available from HP, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.
Overview NVIDIA Accelerators for HPE ProLiant Servers Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA
More informationANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation
ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent
More informationPARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort
PARALLEL PROGRAMMING MANY-CORE COMPUTING: INTRO (1/5) Rob van Nieuwpoort rob@cs.vu.nl Schedule 2 1. Introduction, performance metrics & analysis 2. Many-core hardware 3. Cuda class 1: basics 4. Cuda class
More informationTESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications
TESLA P PERFORMANCE GUIDE HPC and Deep Learning Applications MAY 217 TESLA P PERFORMANCE GUIDE Modern high performance computing (HPC) data centers are key to solving some of the world s most important
More informationQUADRO ADVANCED VISUALIZATION. PREVAIL & PREVAIL ELITE SSDs PROFESSIONAL STORAGE -
PNY Professional Solutions QUADRO ADVANCED VISUALIZATION TESLA PARALLEL COMPUTING PREVAIL & PREVAIL ELITE SSDs PROFESSIONAL STORAGE NVS COMMERCIAL GRAPHICS Delivering the broadest range of Professional
More informationS8901 Quadro for AI, VR and Simulation
S8901 Quadro for AI, VR and Simulation Carl Flygare, PNY Quadro Product Marketing Manager Allen Bourgoyne, NVIDIA Senior Product Marketing Manager The question of whether a computer can think is no more
More informationCUDA PROGRAMMING MODEL. Carlo Nardone Sr. Solution Architect, NVIDIA EMEA
CUDA PROGRAMMING MODEL Carlo Nardone Sr. Solution Architect, NVIDIA EMEA CUDA: COMMON UNIFIED DEVICE ARCHITECTURE Parallel computing architecture and programming model GPU Computing Application Includes
More informationThe following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, ML-series and SL-series servers.
Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.
More informationPractical Introduction to CUDA and GPU
Practical Introduction to CUDA and GPU Charlie Tang Centre for Theoretical Neuroscience October 9, 2009 Overview CUDA - stands for Compute Unified Device Architecture Introduced Nov. 2006, a parallel computing
More informationIntel Performance Libraries
Intel Performance Libraries Powerful Mathematical Library Intel Math Kernel Library (Intel MKL) Energy Science & Research Engineering Design Financial Analytics Signal Processing Digital Content Creation
More informationThe following NVIDIA accelerators are available from HPE, for use in certain HPE ProLiant DL-series, MLseries and SL-series servers.
Overview Hewlett Packard Enterprise supports, on select HPE ProLiant servers, computational accelerator modules based on NVIDIA Tesla, NVIDIA GRID, and NVIDIA Quadro Graphical Processing Unit (GPU) technology.
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationAdvanced CUDA Programming. Dr. Timo Stich
Advanced CUDA Programming Dr. Timo Stich (tstich@nvidia.com) Outline SIMT Architecture, Warps Kernel optimizations Global memory throughput Launch configuration Shared memory access Instruction throughput
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationdesigning a GPU Computing Solution
designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained
More informationIntroduction to CUDA CME343 / ME May James Balfour [ NVIDIA Research
Introduction to CUDA CME343 / ME339 18 May 2011 James Balfour [ jbalfour@nvidia.com] NVIDIA Research CUDA Programing system for machines with GPUs Programming Language Compilers Runtime Environments Drivers
More informationWHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016
WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid
More informationENDURING DIFFERENTIATION. Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 Transistors (thousands) 1.1X per year 10 3 10 2 Single-threaded
More informationENDURING DIFFERENTIATION Timothy Lanfear
ENDURING DIFFERENTIATION Timothy Lanfear WHERE ARE WE? 2 LIFE AFTER DENNARD SCALING GPU-ACCELERATED PERFORMANCE 10 7 40 Years of Microprocessor Trend Data 10 6 10 5 10 4 10 3 10 2 Single-threaded perf
More informationNVIDIA CUDA Libraries
NVIDIA CUDA Libraries Ujval Kapasi*, Elif Albuz*, Philippe Vandermersch*, Nathan Whitehead*, Frank Jargstorff* San Jose Convention Center Sept 22, 2010 *NVIDIA NVIDIA CUDA Libraries Applications 3 rd Party
More informationAdvanced CUDA Optimization 1. Introduction
Advanced CUDA Optimization 1. Introduction Thomas Bradley Agenda CUDA Review Review of CUDA Architecture Programming & Memory Models Programming Environment Execution Performance Optimization Guidelines
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationStatus and Directions of NVIDIA GPUs for Earth System Modeling
Status and Directions of NVIDIA GPUs for Earth System Modeling Stan Posey HPC Industry Development NVIDIA, Santa Clara, CA, USA 1 NVIDIA and HPC Evolution of GPUs Public, based in Santa Clara, CA ~$4B
More informationTechnische Universität München. GPU Programming. Rüdiger Westermann Chair for Computer Graphics & Visualization. Faculty of Informatics
GPU Programming Rüdiger Westermann Chair for Computer Graphics & Visualization Faculty of Informatics Overview Programming interfaces and support libraries The CUDA programming abstraction An in-depth
More informationNvidia Tesla The Personal Supercomputer
International Journal of Allied Practice, Research and Review Website: www.ijaprr.com (ISSN 2350-1294) Nvidia Tesla The Personal Supercomputer Sameer Ahmad 1, Umer Amin 2, Mr. Zubair M Paul 3 1 Student,
More informationHIGH-PERFORMANCE COMPUTING WITH CUDA AND TESLA GPUS
HIGH-PERFORMANCE COMPUTING WITH CUDA AND TESLA GPUS Timothy Lanfear, NVIDIA WHAT IS GPU COMPUTING? What is GPU Computing? x86 PCIe bus GPU Computing with CPU + GPU Heterogeneous Computing Low Latency or
More informationSpeedup Altair RADIOSS Solvers Using NVIDIA GPU
Innovation Intelligence Speedup Altair RADIOSS Solvers Using NVIDIA GPU Eric LEQUINIOU, HPC Director Hongwei Zhou, Senior Software Developer May 16, 2012 Innovation Intelligence ALTAIR OVERVIEW Altair
More informationACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015 COMMODITY DISRUPTS CUSTOM SOURCE: Top500 ACCELERATED COMPUTING: THE PATH FORWARD It s time to start
More informationHybrid Architectures Why Should I Bother?
Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,
More informationCS 470 Spring Other Architectures. Mike Lam, Professor. (with an aside on linear algebra)
CS 470 Spring 2018 Mike Lam, Professor Other Architectures (with an aside on linear algebra) Aside (P3 related): linear algebra Many scientific phenomena can be modeled as matrix operations Differential
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationSYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA GPUS
SYNERGIE VON HPC UND DEEP LEARNING MIT NVIDIA S Axel Koehler, Principal Solution Architect HPCN%Workshop%Goettingen,%14.%Mai%2018 NVIDIA - AI COMPUTING COMPANY Computer Graphics Computing Artificial Intelligence
More informationMAGMA. Matrix Algebra on GPU and Multicore Architectures
MAGMA Matrix Algebra on GPU and Multicore Architectures Innovative Computing Laboratory Electrical Engineering and Computer Science University of Tennessee Piotr Luszczek (presenter) web.eecs.utk.edu/~luszczek/conf/
More informationGPU Computing with CUDA. Part 2: CUDA Introduction
GPU Computing with CUDA Part 2: CUDA Introduction Dortmund, June 4, 2009 SFB 708, AK "Modellierung und Simulation" Dominik Göddeke Angewandte Mathematik und Numerik TU Dortmund dominik.goeddeke@math.tu-dortmund.de
More informationGPU ACCELERATED COMPUTING. 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation
GPU ACCELERATED COMPUTING 1 st AlsaCalcul GPU Challenge, 14-Jun-2016, Strasbourg Frédéric Parienté, Tesla Accelerated Computing, NVIDIA Corporation GAMING PRO ENTERPRISE VISUALIZATION DATA CENTER AUTO
More informationGPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.
Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8
More information