Early 3D Graphics. NVIDIA Corporation Perspective study of a chalice Paolo Uccello, circa 1450

Size: px
Start display at page:

Download "Early 3D Graphics. NVIDIA Corporation Perspective study of a chalice Paolo Uccello, circa 1450"

Transcription

1

2

3 Early 3D Graphics Perspective study of a chalice Paolo Uccello, circa 1450

4 Early Graphics Hardware Artist using a perspective machine Albrecht Dürer, 1525 Perspective study of a chalice Paolo Uccello, circa 1450

5 Early Electronic Graphics Hardware SKETCHPAD: A Man-Machine Graphical Communication System Ivan Sutherland, 1963

6 The Graphics Pipeline The Geometry Engine: A VLSI Geometry System for Graphics Jim Clark, 1982

7 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending

8 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending

9 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending

10 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending

11 The Graphics Pipeline Vertex Transform & Lighting Triangle Setup & Rasterization Texturing & Pixel Shading Depth Test & Blending

12 The Graphics Pipeline Vertex Key abstraction of real-time graphics Rasterize Hardware used to look like this Pixel Test & Blend One chip/board per stage Fixed data flow through pipeline

13 SGI RealityEngine (1993) Vertex Rasterize Pixel Test & Blend!"#$%&'()(*+%!"#$%&'()*%)" +,#-.%/0,%-.//0&12%34+

14 SGI InfiniteReality (1997) Vertex Rasterize Pixel Test & Blend 567$#*8%($%9)+%1)2%)%&"!"#$%&'3454,"#$6&%7"4*,#-.%/040'0&"7,%-.//0&12%3:

15 The Graphics Pipeline Vertex Remains a useful abstraction Rasterize Hardware used to look like this Pixel Test & Blend

16 The Graphics Pipeline Vertex Rasterize!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1' $644 10' 1 A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Hardware used to look like this: Pixel Vertex, pixel processing became programmable Test & Blend

17 The Graphics Pipeline Vertex Geometry Rasterize Pixel Test & Blend!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1' $644 10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Hardware used to look like this Vertex, pixel processing became programmable New stages added

18 The Graphics Pipeline Vertex Tessellation Geometry Rasterize Pixel!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1' $644 10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH >I1J"A"9I1J"E"=I1JH K Hardware used to look like this Vertex, pixel processing became programmable New stages added Test & Blend GPU architecture increasingly centers around shader execution

19 Modern GPUs: Unified Design Discrete Design Unified Design Shader A Shader B ibuffer ibuffer ibuffer ibuffer Shader Core Shader C obuffer obuffer obuffer obuffer Shader D Vertex shaders, pixel shaders, etc. become threads running different programs on a flexible core

20 L2 L1 TF -($"B%C%09?$(#DE( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"(.7B"$%&??(8F)(# 26?$ L1 TF L1 TF L1 TF L1 TF L1 TF L1 TF L1 TF L2 L2 L2 L2 L2 GeForce 8: Modern GPU Architecture

21 L2 L1 TF -($"B%C%09?$(#DE( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"(.7B"$%&??(8F)(# 26?$ L1 TF L1 TF L1 TF L1 TF L1 TF L1 TF L1 TF L2 L2 L2 L2 L2 GeForce 8: Modern GPU Architecture

22 Modern GPU Architecture: GT200 -($"B%C%09?$(#DE( /(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"( TF L1 TF L1 TF L1 TF L1 TF L1 ;<#(9=%-><(=")(# L1 L1 L1 L1 L1 TF TF TF TF TF L2 L2 L2 L2 L2 L2 L2 L2

23 Next-Gen GPU Architecture: Fermi DRAM I/F Giga Thread HOST I/F DRAM I/F DRAM I/F DRAM I/F L2 DRAM I/F DRAM I/F NVIDIA next-!"#$%&"'()*$+',-).",./'"

24 GPUs Today Lessons from Graphics Pipeline Throughput is paramount!"#$%&'( )*%+,-./ Create, run, & retire lots of threads very rapidly Use multithreading to hide latency '56 ')*%+,-./ )% 68*%+,-./ &'5*%+,-./ %((88 6(&*%+,-./!"#$%&' )9 +,-./!""# $%%% $%%# $%!%

25

26 Perspective 1 TeraFLOP in 1993

27 !"#$%&#'($)**+#,-$./' Computers no longer get faster, just wider You must re-think your algorithms to be parallel! Data-parallel computing is most scalable solution

28 Why GPU Computing? Fact: Throughput nobody architecture cares about! massive theoretical parallelism peak Massive parallelism! Challenge: computational horsepower harness GPU power for real application performance GPU GFLOPS $"# 0&12345,-/&89*:;)!"#!"#$#%&'()*%&+,-.- /0-& &89*:;) CPU

29 CUDA Successes 146X 36X 18X 50X 100X Medical Imaging U of Utah Molecular Dynamics U of Illinois Video Transcoding Elemental Tech Matlab Computing AccelerEyes Astrophysics RIKEN 149X 47X 20X 130X 30X Financial simulation Oxford Linear Algebra Universidad Jaime 3D Ultrasound Techniscan Quantum Chemistry U of Illinois Gene Sequencing U of Maryland

30 Accelerating Insight 4.6 Days 2.7 Days 8 Hours 3 Hours 27 Minutes 30 Minutes 13 Minutes 16 Minutes CPU Only Heterogeneous with Tesla GPU

31

32 GPU Computing 1.0 (Ignoring prehistory: Ikonas, Pixel Machine, Pixel-01/2#-34 GPU Computing 1.0: compute pretending to be graphics Disguise data as textures or geometry Disguise algorithm as render passes!trick graphics pipeline into doing your computation! Term GPGPU coined by Mark Harris

33 Typical GPGPU Constructs

34 Typical GPGPU Constructs

35 GPGPU Hardware & Algorithms GPUs get progressively more capable Fixed-function! register combiners! shaders fp32 pixel hardware greatly extends reach Algorithms get more sophisticated Cellular automata! PDE solvers! ray tracing Clever graphics tricks High-level shading languages emerge

36 GPU Computing Enter CUDA GPU Computing 2.0: direct compute Program GPU directly, no graphics-based restrictions GPU Computing supplants graphics-based GPGPU November 2006: NVIDIA introduces CUDA

37 CUDA In One Slide Thread B(#G$<#(9= )6>9)%8(86#* Local barrier Block B(#GF)6>'?<9#(= 8(86#* Kernel &''$% Global barrier... Kernel!"#$%... B(#G=(HD>( I)6F9) 8(86#*

38 CUDA C Example!"#$%&'()*+&,-#'./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5 6 3"- /#01%#%7%89%# : 09%;;#5 *<#=%7%'4(<#=%;%*<#=9 > Serial C Code??%@0!"A,%&,-#'. BCDEF%A,-0,. &'()*+&,-#'./02%GH82%(2%*59 ++I."J'.++%!"#$%&'()*+)'-'..,./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5 6 #01%#%7%J."KA@$(H(4J."KAL#MH(%;%1N-,'$@$(H(9 #3 /# : 05%%*<#=%7%'4(<#=%;%*<#=9 >??%@0!"A,%)'-'..,. BCDEF%A,-0,. O#1N%GPQ%1N-,'$&?J."KA #01%0J."KA&%7%/0%;%GPP5%?%GPQ9 &'()*+)'-'..,.:::0J."KA&2%GPQRRR/02%GH82%(2%*59 Parallel C Code

39 Heterogeneous Programming Use the right processor for the right job Serial Code Parallel Kernel &''((()*+,-.)*/01)222$"#34%5... Serial Code Parallel Kernel!"#((()*+,-.)*/01)222$"#34%5...

40 GPU Computing An Ecosystem GPU Computing 3.0: an emerging ecosystem Hardware & product lines Education & research Algorithmic sophistication Consumer applications Cross-platform standards High-level languages!"#$%&'()*$+,$-./)(*01$$%&0()2$3,2&*4$56/+7,89*4$/55*

41 Products CUDA is in products from laptops to supercomputers `

42 Emerging HPC Products New class of hybrid GPU-CPU servers 2 Tesla M1060 GPUs Upto 18 Tesla M1060 GPUs SuperMicro 1U GPU Server Bull Bullx Blade Enclosure

43 Algorithmic Sophistication Sort Sparse matrix Hash tables Fast multipole method Ray tracing (parallel tree traversal) 012$3"4/5.4$6'7($%89.)():+.)7#$76$;9+'4"$<+.')=-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007

44 Cross-Platform Standards GPU Computing Applications CUDA C OpenCL tm Direct Compute CUDA Fortran Java Python.NET MATLAB 3 NVIDIA GPU with the CUDA Parallel Computing Architecture OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

45 CUDA Momentum Over 250 universities teach CUDA Over 1000 research papers CUDA powered TSUBAME 29 th fastest supercomputer in the world 180 Million CUDA GPUs 100,000 active developers Over 500 CUDA Apps

46 thrust Data Structures!!"#$%!&&'()*+(,)(+!-#!!"#$%!&&"-%!,)(+!-#!!"#$%!&&'()*+(,.!#! Etc. Algorithms!!"#$%!&&%-#!!!"#$%!&&#('$+(!!"#$%!&&(/+0$%*)(,%+12! Etc. thrust: a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template Library for CUDA C++ template metaprogramming automatically chooses the fastest code path at compile time

47 thrust::sort sort.cu 60*7,819)(:;#84:<;'4:=>97:'#?;2 60*7,819)(:;#84:<19>079=>97:'#?;2 60*7,819)(:;#84:<39*9#":9?;2 60*7,819)(:;#84:<4'#:?;2 60*7,819)(74:1,0!2 A <<)39*9#":9)#"*1'@)1":")'*):;9);'4: :;#84:BB;'4:=>97:'#(0*:2);=>97$CDDDDDD%5 :;#84:BB39*9#":9$;=>97?!930*$%.);=>97?9*1$%.)#"*1%5 <<):#"*4&9#):')19>079)"*1)4'#: :;#84:BB19>079=>97:'#(0*:2)1=>97 E);=>975 <<)4'#:)CFDG)HI!)-9J4<497)'*)K/IDD :;#84:BB4'#:$1=>97?!930*$%.)1=>97?9*1$%%5 L #9:8#*)D5

48 GPU Begins to Vanish Ever-increasing number of codes & commercial Some bio/chem codes available or porting: NAMD / VMD, GROMACS (alpha), HOOMD, GPU HMMER, MUMmerGPU, AutoDock3 LAMMPS, CHARMM, Q-ChemA$>/;--@/2A$B)CDE3 Consumer applications, e.g. Photoshop Badaboom, vreveal, Nero MoveIt OS: Microsoft Windows 7, MacOS Snowleapord

49

50 Next-Gen GPU Architecture: Fermi 3 billion transistors Over 2x the cores (512 total) ~2x the memory bandwidth L1 and L2 caches 8x the peak DP performance ECC C++

51 SM Microarchitecture Instruction Cache Objective 5 optimize for GPU computing New ISA Revamp issue / control flow New CUDA core architecture 32 cores per SM (512 total) 64KB of configurable L1$ / shared memory Scheduler Dispatch Scheduler Dispatch Register File Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Load/Store Units x 16 Special Func Units x 4 FP32 FP64 INT SFU LD/ST Ops / clk Interconnect Network 64K Configurable Cache/Shared Mem Uniform Cache

52 SM Microarchitecture New IEEE arithmetic standard Fused Multiply-Add (FMA) for & DP New integer ALU optimized for 64-bit and extended precision ops CUDA Core Dispatch Port Operand Collector FP Unit INT Unit Result Queue Instruction Cache Scheduler Dispatch Scheduler Dispatch Register File Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Load/Store Units x 16 Special Func Units x 4 Interconnect Network 64K Configurable Cache/Shared Mem Uniform Cache

53 Hardware Thread Scheduling Concurrent kernel execution + faster context switch Kernel 1 Kernel 1 Kernel 2 Kernel 2 Kernel 2 Kernel 3 Ker 4 Time Kernel 2 Kernel 3 nel Kernel 5 Kernel 4 Kernel 5 Serial Kernel Execution Parallel Kernel Execution

54 More Fermi Goodness ECC protection for DRAM, L2, L1, RF Unified 40-bit address space for local, shared, global 5-20x faster atomics Dual DMA engines for CPU"! GPU transfers ISA extensions for C++ (e.g. virtual functions)

55

56 Key CUDA Challenges Express other programming models elegantly Producer-consumer: persistent thread blocks reading and writing work queues Task-parallel: kernel, thread block or warp as parallel task C*<"$7*FF*2$%6*'#+-;-#+($G?HB$6/<<#+2- Foster more high-level languages & platforms Improve & mature development environment

57 Key GPU Workloads Computational graphics Scientific and numeric computing Image processing 5 video & images Computer vision Speech & natural language Machine learning

58 The Future of Computing? Forward-looking statements: All future interesting problems are throughput problems. GPUs will evolve to be the general-purpose throughput processors. CPUs as we know them will become (already are?)

59 Final Thoughts 5 Education We should teach parallel computing in CS 1 or CS 2 G*F6;<#+-$I*2,<$9#<$=/-<#+A$:;-<$'@I#+ now Manycore is the future of computing Heapsort and mergesort Both O(n lg n) One parallel-friendly, one not Students need to understand this early

60

61 Miscellaneous Thoughts

62 NVIDIA Resources Available NVIDIA enables heterogeneous computing research and teaching: Grants for leading researchers in GPU Computing Ph.D. Fellowship program Professor Partnership program Discounted hardware GPU Computing Ventures 5 Venture fund focused on GPU computing Technical Training and Assistance

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Mark Harris, NVIDIA Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction to Tesla CUDA Architecture Programming & Memory

More information

Accelerating High Performance Computing.

Accelerating High Performance Computing. Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel

More information

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G

G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Joined Advanced Student School (JASS) 2009 March 29 - April 7, 2009 St. Petersburg, Russia G P G P U : H I G H - P E R F O R M A N C E C O M P U T I N G Dmitry Puzyrev St. Petersburg State University Faculty

More information

HIGH-PERFORMANCE COMPUTING

HIGH-PERFORMANCE COMPUTING HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Tesla GPU Computing A Revolution in High Performance Computing

Tesla GPU Computing A Revolution in High Performance Computing Tesla GPU Computing A Revolution in High Performance Computing Gernot Ziegler, Developer Technology (Compute) (Material by Thomas Bradley) Agenda Tesla GPU Computing CUDA Fermi What is GPU Computing? Introduction

More information

The Graphics Pipeline: Evolution of the GPU!

The Graphics Pipeline: Evolution of the GPU! 1 Today The Graphics Pipeline: Evolution of the GPU! Bigger picture: Parallel processor designs! Throughput-optimized (GPU-like)! Latency-optimized (Multicore CPU-like)! A look at NVIDIA s Fermi GPU architecture!

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Chris Butler NVIDIA

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Chris Butler NVIDIA HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Chris Butler NVIDIA Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of Chromatophores Chromatophore

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

NVIDIA Fermi Architecture

NVIDIA Fermi Architecture Administrivia NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS 565 - Spring 2011 Assignment 4 grades returned Project checkpoint on Monday Post an update on your blog beforehand Poster

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

Technical Report on IEIIT-CNR

Technical Report on IEIIT-CNR Technical Report on Architectural Evolution of NVIDIA GPUs for High-Performance Computing (IEIIT-CNR-150212) Angelo Corana (Decision Support Methods and Models Group) IEIIT-CNR Istituto di Elettronica

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 12 GPU Architecture (NVIDIA G80) Mattan Erez The University of Texas at Austin Outline 3D graphics recap and

More information

Introduction to CUDA (1 of n*)

Introduction to CUDA (1 of n*) Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements

More information

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics

More information

Introduction to CUDA

Introduction to CUDA Introduction to CUDA Oliver Meister November 7 th 2012 Tutorial Parallel Programming and High Performance Computing, November 7 th 2012 1 References D. Kirk, W. Hwu: Programming Massively Parallel Processors,

More information

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147 GRAPHICS HARDWARE Niels Joubert, 4th August 2010, CS147 Rendering Latest GPGPU Today Enabling Real Time Graphics Pipeline History Architecture Programming RENDERING PIPELINE Real-Time Graphics Vertices

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU. Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8

More information

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13

Introduction to GPU Computing. 周国峰 Wuhan University 2017/10/13 Introduction to GPU Computing chandlerz@nvidia.com 周国峰 Wuhan University 2017/10/13 GPU and Its Application 3 Ways to Develop Your GPU APP An Example to Show the Developments Add GPUs: Accelerate Science

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

Lecture 1: Introduction and Computational Thinking

Lecture 1: Introduction and Computational Thinking PASI Summer School Advanced Algorithmic Techniques for GPUs Lecture 1: Introduction and Computational Thinking 1 Course Objective To master the most commonly used algorithm techniques and computational

More information

GPU Fundamentals Jeff Larkin November 14, 2016

GPU Fundamentals Jeff Larkin November 14, 2016 GPU Fundamentals Jeff Larkin , November 4, 206 Who Am I? 2002 B.S. Computer Science Furman University 2005 M.S. Computer Science UT Knoxville 2002 Graduate Teaching Assistant 2005 Graduate

More information

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Timothy Lanfear, NVIDIA

HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS. Timothy Lanfear, NVIDIA HIGH-PERFORMANCE COMPUTING WITH NVIDIA TESLA GPUS Timothy Lanfear, NVIDIA WHY GPU COMPUTING? Science is Desperate for Throughput Gigaflops 1,000,000,000 1 Exaflop 1,000,000 1 Petaflop Bacteria 100s of

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

Graphics and Imaging Architectures

Graphics and Imaging Architectures Graphics and Imaging Architectures Kayvon Fatahalian http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/ About Kayvon New faculty, just arrived from Stanford Dissertation: Evolving real-time graphics

More information

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO

2009: The GPU Computing Tipping Point. Jen-Hsun Huang, CEO 2009: The GPU Computing Tipping Point Jen-Hsun Huang, CEO Someday, our graphics chips will have 1 TeraFLOPS of computing power, will be used for playing games to discovering cures for cancer to streaming

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

GPU A rchitectures Architectures Patrick Neill May

GPU A rchitectures Architectures Patrick Neill May GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know

More information

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include 3.1 Overview Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include GPUs (Graphics Processing Units) AMD/ATI

More information

Computer Architecture 计算机体系结构. Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Thread, Multithreading, SMT CMP and multicore Benefits of

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

Exotic Methods in Parallel Computing [GPU Computing]

Exotic Methods in Parallel Computing [GPU Computing] Exotic Methods in Parallel Computing [GPU Computing] Frank Feinbube Exotic Methods in Parallel Computing Dr. Peter Tröger Exotic Methods in Parallel Computing FF 2012 Architectural Shift 2 Exotic Methods

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

Hybrid Architectures Why Should I Bother?

Hybrid Architectures Why Should I Bother? Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Introduction to GPU Computing Junjie Lai, NVIDIA Corporation

Introduction to GPU Computing Junjie Lai, NVIDIA Corporation Introduction to GPU Computing Junjie Lai, NVIDIA Corporation Outline Evolution of GPU Computing Heterogeneous Computing CUDA Execution Model & Walkthrough of Hello World Walkthrough : 1D Stencil Once upon

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Lecture 1: Gentle Introduction to GPUs

Lecture 1: Gentle Introduction to GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 1: Gentle Introduction to GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Who Am I? Mohamed

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

GPU ARCHITECTURE Chris Schultz, June 2017

GPU ARCHITECTURE Chris Schultz, June 2017 GPU ARCHITECTURE Chris Schultz, June 2017 MISC All of the opinions expressed in this presentation are my own and do not reflect any held by NVIDIA 2 OUTLINE CPU versus GPU Why are they different? CUDA

More information

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms CS 590: High Performance Computing GPU Architectures and CUDA Concepts/Terms Fengguang Song Department of Computer & Information Science IUPUI What is GPU? Conventional GPUs are used to generate 2D, 3D

More information

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Haicheng Wu 1, Daniel Zinn 2, Molham Aref 2, Sudhakar Yalamanchili 1 1. Georgia Institute of Technology 2. LogicBlox

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

GPU Architecture. Samuli Laine NVIDIA Research

GPU Architecture. Samuli Laine NVIDIA Research GPU Architecture Samuli Laine NVIDIA Research Today The graphics pipeline: Evolution of the GPU Throughput-optimized parallel processor design I.e., the GPU Contrast with latency-optimized (CPU-like) design

More information

Advanced CUDA Programming. Dr. Timo Stich

Advanced CUDA Programming. Dr. Timo Stich Advanced CUDA Programming Dr. Timo Stich (tstich@nvidia.com) Outline SIMT Architecture, Warps Kernel optimizations Global memory throughput Launch configuration Shared memory access Instruction throughput

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing

More information

Programming GPUs with CUDA. Prerequisites for this tutorial. Commercial models available for Kepler: GeForce vs. Tesla. I.

Programming GPUs with CUDA. Prerequisites for this tutorial. Commercial models available for Kepler: GeForce vs. Tesla. I. Programming GPUs with CUDA Tutorial at 1th IEEE CSE 15 and 13th IEEE EUC 15 conferences Prerequisites for this tutorial Porto (Portugal). October, 20th, 2015 You (probably) need experience with C. You

More information

GPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34

GPU Programming. Lecture 2: CUDA C Basics. Miaoqing Huang University of Arkansas 1 / 34 1 / 34 GPU Programming Lecture 2: CUDA C Basics Miaoqing Huang University of Arkansas 2 / 34 Outline Evolvements of NVIDIA GPU CUDA Basic Detailed Steps Device Memories and Data Transfer Kernel Functions

More information

Computer Architecture

Computer Architecture Jens Teubner Computer Architecture Summer 2017 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2017 Jens Teubner Computer Architecture Summer 2017 34 Part II Graphics

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Technology for a better society. hetcomp.com

Technology for a better society. hetcomp.com Technology for a better society hetcomp.com 1 J. Seland, C. Dyken, T. R. Hagen, A. R. Brodtkorb, J. Hjelmervik,E Bjønnes GPU Computing USIT Course Week 16th November 2011 hetcomp.com 2 9:30 10:15 Introduction

More information

GPU Computing with Fornax. Dr. Christopher Harris

GPU Computing with Fornax. Dr. Christopher Harris GPU Computing with Fornax Dr. Christopher Harris ivec@uwa CAASTRO GPU Training Workshop 8-9 October 2012 Introducing the Historical GPU Graphics Processing Unit (GPU) n : A specialised electronic circuit

More information

ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology

ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU. Peng Wang HPC Developer Technology ADVANCES IN EXTREME-SCALE APPLICATIONS ON GPU Peng Wang HPC Developer Technology NVIDIA SuperPhones to SuperComputers Computers no longer get faster, just wider Architectural Features Common to All Processors

More information

GPGPU. Peter Laurens 1st-year PhD Student, NSC

GPGPU. Peter Laurens 1st-year PhD Student, NSC GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

CSC573: TSHA Introduction to Accelerators

CSC573: TSHA Introduction to Accelerators CSC573: TSHA Introduction to Accelerators Sreepathi Pai September 5, 2017 URCS Outline Introduction to Accelerators GPU Architectures GPU Programming Models Outline Introduction to Accelerators GPU Architectures

More information

CUDA. Matthew Joyner, Jeremy Williams

CUDA. Matthew Joyner, Jeremy Williams CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel

More information

Using GPUs to compute the multilevel summation of electrostatic forces

Using GPUs to compute the multilevel summation of electrostatic forces Using GPUs to compute the multilevel summation of electrostatic forces David J. Hardy Theoretical and Computational Biophysics Group Beckman Institute for Advanced Science and Technology University of

More information

CUDA Programming Model

CUDA Programming Model CUDA Xing Zeng, Dongyue Mou Introduction Example Pro & Contra Trend Introduction Example Pro & Contra Trend Introduction What is CUDA? - Compute Unified Device Architecture. - A powerful parallel programming

More information

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output

More information

Fundamental CUDA Optimization. NVIDIA Corporation

Fundamental CUDA Optimization. NVIDIA Corporation Fundamental CUDA Optimization NVIDIA Corporation Outline Fermi/Kepler Architecture Kernel optimizations Launch configuration Global memory throughput Shared memory access Instruction throughput / control

More information

CUDA Architecture & Programming Model

CUDA Architecture & Programming Model CUDA Architecture & Programming Model Course on Multi-core Architectures & Programming Oliver Taubmann May 9, 2012 Outline Introduction Architecture Generation Fermi A Brief Look Back At Tesla What s New

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

CUDA programming model. N. Cardoso & P. Bicudo. Física Computacional (FC5)

CUDA programming model. N. Cardoso & P. Bicudo. Física Computacional (FC5) CUDA programming model N. Cardoso & P. Bicudo Física Computacional (FC5) N. Cardoso & P. Bicudo CUDA programming model 1/23 Outline 1 CUDA qualifiers 2 CUDA Kernel Thread hierarchy Kernel, configuration

More information

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 7: The Programmable GPU Core Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today A brief history of GPU programmability Throughput processing core 101 A detailed

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Lecture 5. Performance programming for stencil methods Vectorization Computing with GPUs

Lecture 5. Performance programming for stencil methods Vectorization Computing with GPUs Lecture 5 Performance programming for stencil methods Vectorization Computing with GPUs Announcements Forge accounts: set up ssh public key, tcsh Turnin was enabled for Programming Lab #1: due at 9pm today,

More information

Sparse Linear Algebra in CUDA

Sparse Linear Algebra in CUDA Sparse Linear Algebra in CUDA HPC - Algorithms and Applications Alexander Pöppl Technical University of Munich Chair of Scientific Computing November 22 nd 2017 Table of Contents Homework - Worksheet 2

More information

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

World s most advanced data center accelerator for PCIe-based servers

World s most advanced data center accelerator for PCIe-based servers NVIDIA TESLA P100 GPU ACCELERATOR World s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying

More information