GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

Size: px
Start display at page:

Download "GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147"

Transcription

1 GRAPHICS HARDWARE Niels Joubert, 4th August 2010, CS147

2 Rendering Latest GPGPU Today Enabling Real Time Graphics Pipeline History Architecture Programming

3 RENDERING PIPELINE Real-Time Graphics

4 Vertices Rendering Pipeline Vertex Processing are transformed into screen space

5 Vertices Rendering Pipeline Vertex Processing are transformed into screen space Each vertex is transformed independently! Vertex Processor Vertex Processor Vertex Processor Vertex Processor Vertex Processor Vertex Processor

6 Vertices Primitives Rendering Pipeline Primitive Processing are organized into primitives are clipped and culled

7 Rendering Pipeline Rasterization Primitives are rasterized into pixel fragments.

8 Rendering Pipeline Rasterization Each primitive is rasterized Primitives are rasterized into pixel fragments. independently!

9 Fragments Rendering Pipeline Fragment Processing are shaded to compute a color at a pixel

10 Fragments Rendering Pipeline Fragment Processing Each fragment is shaded independently! are shaded to compute a color at a pixel

11 Fragments Z-Buffer Rendering Pipeline Pixel Operations are blended into the framebuffer determines visibility

12 Memory Many All Conflicts Revisit Rendering Pipeline Frame buffer location with aggregation ability fragments end up on same pixel fragments are handled independently when writing to framebuffer? this later! [Synchronization / Atomics]

13 Rendering Pipeline Pipeline Entities

14 Rendering Pipeline Graphics Pipeline Vertices Vertex Generation Vertex Processing Vertex Buffers Transformations Vertex Shader 3D to Screen Space Position, Color, Texture Coordinate manipulations Primitives Primitive Generation Primitive Processing Polygon Data Geometry Shader Interpolate Variables Add or Remove Vertices Fragments Fragment Generation Fragment Processing Fragment shader Interpolate & Rast Lighting Effects Pixels Pixel Operations Z-Buffer Blend Producer-Consumer

15 HISTORY How we got to where we are

16 History: 1963 Sketchpad The first GUI.

17 History: 1972 Xerox Alto 1972: Mouse, Keyboard, Files, Folders, Draggable windows Personal Computer

18 Launched Did History: 1977 Apple II Mhz 6502 processor 4KB RAM (expandable to 48KB for $1340) Graphics for the Masses it have a graphics card?

19 Video No, History: 1977 Apple II CPU: Writes pixel data to RAM Hardware: Reads RAM in scanlines, generates NTSC there is no graphics card.

20 The History: 1982 The Geometry Engine A VLSI Geometry System for Graphics James H. Clark, Computer Systems Lab, Stanford University & Silicon Graphics, Inc. Geometry System is a... computing system for computer graphics constructed from a basic building block, the Geometry Engine.

21 History: 1982 The Geometry Engine A VLSI Geometry System for Graphics James H. Clark, Computer Systems Lab, Stanford University & Silicon Graphics, Inc. 5 Million FLOPS

22 History: 1982 The Geometry Engine A VLSI Geometry System for Graphics James H. Clark, Computer Systems Lab, Stanford University & Silicon Graphics, Inc.

23 History: 1982 The Geometry Engine Instruction Set:

24 Silicon History: 1982 IRIS - Integrated Raster Imaging System Graphics Incʼs first real-time 3D rendering engine. CPU Multibus Geometry System Raster System EtherNet

25

26 BLock History: 1985 BLIT & Commodore Amiga Image Transfer Co-processor / logic block Rapid movement and modification of memory blocks Commodore Amiga had a complete blitter in hardware, in a separate graphics processor

27 History: 1993 Silicon Graphics RealityEngine Its target capability is the rendering of lighted, smooth shaded, depth buffered, texture mapped, antialiased triangles. - RealityEngine Graphics, K. Akeley, 1993 Vertex Generation Vertex Processing Primitive Generation Primitive Processing Fragment Generation Fragment Processing Pixel Operations

28 Silicon OpenGL History: 1992 OpenGL 1.0 & 1995 Direct3D OpenGL 1.0 Graphics Proprietary IRIS GL API (state of the art) OpenGL as open standard derived from IRIS Standardised HW access, device drivers becomes important HUGE success: allows HW to evolve, SW to decouple

29 History: 1998 NVidia RIVA TNT CPU Vertex Generation Vertex Processing GPU Clip/Cull/Rast Texture Map Pixel Ops / Framebuffer Primitive Generation Primitive Processing Fragment Generation Fragment Processing Pixel Operations

30 History: 2002 Direct3D 9, OpenGL 2.0 GPU Vertex Generation Vertex Vertex Vertex Vertex Vertex Processing Clip / Cull / Rasterize Vertex Vertex Vertex Vertex Tex Tex Tex Tex Primitive Generation Primitive Processing Vertex Vertex Vertex Vertex Fragment Generation Tex Tex Tex Tex Pixel Ops / Framebuffer Fragment Processing Pixel Operations ATI Radeon 9700

31 History: 2006 Unified Shading GPUs GPU Vertex Generation Core Core Core Core Vertex Processing Clip / Cull / Rasterize Scheduler Core Core Core Core Tex Tex Tex Tex Core Core Core Core Core Core Core Core Tex Tex Tex Tex Primitive Generation Primitive Processing Fragment Generation Pixel Ops Pixel Ops Pixel Ops Fragment Processing Pixel Ops Pixel Ops Pixel Ops Pixel Operations GeForce G80

32 GRAPHICS ARCHITECTURE GPUs as Throughput-Oriented Hardware

33 Single Architecture CPU Evolution stream of instructions REALLY FAST Long, deep pipelines Branch Prediction & Speculative Execution Hierarchy of Caches Instruction Level Parallelism (ILP)

34 Architecture G5 (2003)

35

36 Architecture G5 (2003) 2 Ghz 1 Ghz FSB 4GB RAM 2 FPUs 50 million transistors 215 inst. pipeline Branch Prediction

37 Architecture G5 (2003) Fermi (2010) 2 Ghz 1 Ghz FSB 4GB RAM 1.4 Ghz 1.8 Ghz FSB 4GB RAM (1.5 in GTX) 2 FPUs 50 million transistors 215 inst. pipeline Branch Prediction

38 Architecture G5 (2003) Fermi (2010) 2 Ghz 1 Ghz FSB 4GB RAM 2 FPUs 50 million transistors 1.4 Ghz 1.8 Ghz FSB 4GB RAM (1.5 in GTX) 960 FPUs 3 billion transistors 215 inst. pipeline Branch Prediction No Branch Prediction

39 The - Architecture Mooreʼs Law number of transistors on an integrated circuit doubles every two years Gorden E. Moore

40 The - What Architecture Mooreʼs Law number of transistors on an integrated circuit doubles every two years Gorden E. Moore Matters: How we use these transistors

41 Architecture Buy Performance with Power

42 There Weʼre Architecture Serial Performance Scaling Cannot continue to scale Mhz is no 10 Ghz chip Cannot increase power consumption per area melting chips Can continue to increase transistor count

43 out-of-order vector SSE, multithreading, Architecture Using Transistors Instruction-level parallelism execution, speculation, branch prediction Data-level parallelism units, SIMD execution AVX, Cell SPE, Clearspeed Thread-level parallelism multicore, manycore

44 Architecture Why Massively Parallel Processing? Computation Power of Graphic Processing Units

45 Architecture Why Massively Parallel Processing? Memory Throughput of Graphic Processing Units

46 Architecture Why Massively Parallel Processing? How can this be? Remove transistors dedicated to speed of a single stream of instructions out-of-order execution, speculation, caches, branch prediction CPU: minimize latency of an individual thread More memory bandwidth, more compute Nothing else on the card! Simple design GPU: maximize throughput of all threads.

47 From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University SIGGRAPH 2009: Beyond Programmable Shading:

48 What s in a GPU? Shader Core Shader Core Tex Input Assembly Rasterizer Shader Core Shader Core Tex Output Blend Shader Core Shader Core Shader Core Shader Core Tex Tex Video Decode Work Distributor HW or SW? Heterogeneous chip multi-processor (highly tuned for graphics) SIGGRAPH 2009: Beyond Programmable Shading: 4

49 A diffuse reflectance shader!"#$%&'(#)*"#$+(,&-./'&0123%4".56(#),&-+( 3%4".5(%789.17'+( A( ((3%4".5(B;+( I(( Independent, but no explicit parallelism SIGGRAPH 2009: Beyond Programmable Shading: 5

50 Compile shader 1 unshaded fragment input record!"#$%&'(#)*"#$+(,&-./'&0123%4".56(#),&-+( 3%4".5(%789.17'+( 3%4".:(;733/!&*9";&'<3%4".5(=4'#>(3%4".0(/?@( A( ((3%4".5(B;+( ((B;(C(#),&-D*"#$%&<#)*"#$>(/?@+( ((B;(EC(F%"#$(<(;4.<%789.17'>(=4'#@>(GDG>(HDG@+( (('&./'=(3%4".:<B;>(HDG@+(((( I(( 2;733/!&*9";&'6J(!"#$%&('G>(?:>(.G>(!G( #/%(('5>(?G>(FKGLGM( #";;('5>(?H>(FKGLHM>('5( #";;('5>(?0>(FKGL0M>('5( F%#$('5>('5>(%<GDG@>(%<HDG@( #/%((4G>('G>('5( #/%((4H>('H>('5( #/%((40>('0>('5( #4?((45>(%<HDG@( 1 shaded fragment output record SIGGRAPH 2009: Beyond Programmable Shading: 6

51 Execute shader Fetch/ Decode ALU (Execute) Execution Context!"#$$%&'()*"'+,-. &*/01' &2. /% :2;. /*"".+73.4<3.892:<;3.+7. /*"".+73.4=3.892:=;3.+7. /%1..A /%1..A<3.+<3.+7. /%1..A=3.+=3.+7. SIGGRAPH 2009: Beyond Programmable Shading: 7

52 CPU- style cores Fetch/ Decode ALU (Execute) Execution Context Out-of-order control logic Fancy branch predictor Memory pre-fetcher Data cache (A big one) SIGGRAPH 2009: Beyond Programmable Shading: 13

53 Slimming down Fetch/ Decode ALU (Execute) Execution Context Idea #1: Remove components that help a single instruction stream run fast SIGGRAPH 2009: Beyond Programmable Shading: 14

54 Two cores (two fragments in parallel) fragment 1 fragment 2 Fetch/ Decode Fetch/ Decode!"#$$%&'()*"'+,-. &*/01' &2. /% :2;. /*"".+73.4<3.892:<;3.+7. /*"".+73.4=3.892:=; / >2?2@3.1><?2@. /%1..A /%1..A<3.+<3.+7. /%1..A=3.+=3.+7. /A4..A73.1><?2@. ALU (Execute) Execution Context ALU (Execute) Execution Context!"#$$%&'()*"'+,-. &*/01' &2. /% :2;. /*"".+73.4<3.892:<;3.+7. /*"".+73.4=3.892:=; / >2?2@3.1><?2@. /%1..A /%1..A<3.+<3.+7. /%1..A=3.+=3.+7. /A4..A73.1><?2@. SIGGRAPH 2009: Beyond Programmable Shading: 15

55 Four cores (four fragments in parallel) Fetch/ Decode ALU (Execute) Execution Context Fetch/ Decode ALU (Execute) Execution Context Fetch/ Decode ALU (Execute) Execution Context Fetch/ Decode ALU (Execute) Execution Context SIGGRAPH 2009: Beyond Programmable Shading: 16

56 Sixteen cores (sixteen fragments in parallel) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU 16 cores = 16 simultaneous instruction streams SIGGRAPH 2009: Beyond Programmable Shading: 17

57 Instruction stream sharing But many fragments should be able to share an instruction stream!!"#$$%&'()*"'+,-. &*/01' &2. /% :2;. /*"".+73.4<3.892:<;3.+7. /*"".+73.4=3.892:=;3.+7. /%1..A /%1..A<3.+<3.+7. /%1..A=3.+=3.+7. SIGGRAPH 2009: Beyond Programmable Shading: 18

58 Recall: simple processing core Fetch/ Decode ALU (Execute) Execution Context SIGGRAPH 2009: Beyond Programmable Shading: 19

59 Add ALUs Fetch/ Decode ALU 1 ALU 2 ALU 3 ALU 4 ALU 5 ALU 6 ALU 7 ALU 8 Idea #2: Amortize cost/complexity of managing an instruction stream across many ALUs Ctx Ctx Ctx Ctx SIMD processing Ctx Ctx Ctx Ctx Shared Ctx Data SIGGRAPH 2009: Beyond Programmable Shading: 20

60 Modifying the shader Fetch/ Decode ALU 1 ALU 2 ALU 3 ALU 4 ALU 5 ALU 6 ALU 7 ALU 8 Ctx Ctx Ctx Ctx Ctx Ctx Ctx Ctx Shared Ctx Data!"#$$%&'()*"'+,-. &*/01' &2. /% :2;. /*"".+73.4<3.892:<;3.+7. /*"".+73.4=3.892:=; / >2?2@3.1><?2@. /%1..A /%1..A<3.+<3.+7. /%1..A=3.+=3.+7. /A4..A73.1><?2@. Original compiled shader: Processes one fragment using scalar ops on scalar SIGGRAPH 2009: Beyond Programmable Shading: registers 21

61 Modifying the shader Fetch/ Decode ALU 1 ALU 2 ALU 3 ALU 4 ALU 5 ALU 6 ALU 7 ALU 8 Ctx Ctx Ctx Ctx!"#$%&'())*+,-./',0123 "#$%&+/456,37,8&09:37,8&7;:3<9:37,8&+93 "#$%&4*6337,8&0=:37,8&79:38>9?9@3 "#$%&4/''37,8&0=:37,8&7A:38>9?A@:37,8&0=3 "#$%&4/''37,8&0=:37,8&7B:38>9?B@:37,8&0=3 "#$%&864537,8&0=:37,8&0=:36C9D9E:36CAD9E3 "#$%&4*6337,8&F9:37,8&09:37,8&0=3 "#$%&4*6337,8&FA:37,8&0A:37,8&0=3 "#$%&4*6337,8&FB:37,8&0B:37,8&0=3 "#$%&4F7337,8&F=:36CAD9E3 Ctx Ctx Ctx Ctx Shared Ctx Data SIGGRAPH 2009: Beyond Programmable Shading: New compiled shader: Processes 8 fragments using vector ops on vector registers 22

62 Modifying the shader Fetch/ Decode ALU 1 ALU 2 ALU 3 ALU 4 ALU 5 ALU 6 ALU 7 ALU 8 Ctx Ctx Ctx Ctx !"#$%&'())*+,-./',0123 "#$%&+/456,37,8&09:37,8&7;:3<9:37,8&+93 "#$%&4*6337,8&0=:37,8&79:38>9?9@3 "#$%&4/''37,8&0=:37,8&7A:38>9?A@:37,8&0=3 "#$%&4/''37,8&0=:37,8&7B:38>9?B@:37,8&0=3 "#$%&864537,8&0=:37,8&0=:36C9D9E:36CAD9E3 "#$%&4*6337,8&F9:37,8&09:37,8&0=3 "#$%&4*6337,8&FA:37,8&0A:37,8&0=3 "#$%&4*6337,8&FB:37,8&0B:37,8&0=3 "#$%&4F7337,8&F=:36CAD9E3 Ctx Ctx Ctx Ctx Shared Ctx Data SIGGRAPH 2009: Beyond Programmable Shading: 23

63 128 fragments in parallel SIGGRAPH 2009: Beyond Programmable Shading: 16 cores = 128 ALUs = 16 simultaneous instruction streams 24

64 vertices / fragments primitives 128 [ ] in parallel CUDA threads OpenCL work items compute shader threads primitives vertices fragments SIGGRAPH 2009: Beyond Programmable Shading: 25

65 What is the problem?

66 But what about branches? Time (clocks) ALU 1 ALU ALU 8 T T F T F F F F./01203!4!205,# #123+&#!"#$%#&#'(#)# 7+",#:#9#A#@5>### *#+,-+#)# %#:#'>## *# 9#:#;2<$%=#+%;(># 9#?:#@-># 7+",#:#@5>###.7+-/8+#/01203!4!205,# #123+&# SIGGRAPH 2009: Beyond Programmable Shading: 27

67 But what about branches? Time (clocks) ALU 1 ALU ALU 8 T T F T F F F F Not all ALUs do useful work! Worst case: 1/8 performance./01203!4!205,# #123+&#!"#$%#&#'(#)# 7+",#:#9#A#@5>### *#+,-+#)# %#:#'>## *# 9#:#;2<$%=#+%;(># 9#?:#@-># 7+",#:#@5>###.7+-/8+#/01203!4!205,# #123+&# SIGGRAPH 2009: Beyond Programmable Shading: 28

68 Plenty more intricacies! No time now - See the Beyond Programmable Shading Siggraph talk Think of a GPU as a multi-core processor optimized for maximum throughput: Many SIMD cores working together.

69 Thousands Amenable An efficient GPU workload... on independent pieces of work Uses many ALUs on many cores to instruction stream sharing Uses SIMD instructions Compute-heavy: lots of math for each memory access

70 30 GF100 Architecture SMʼs on GTX480 Resources: 2 x 16 Cores 16 Load/Store units 4 Special Functions Sin/Cos/Sqrt 16 Double Precision units 1.44 Terra FLOPS

71 Mental On GF100 Architecture Model: every clock cycle, you can assign an instruction to two of these resources Resources: 2 x 16 Cores 16 Load/Store units 4 Special Functions 16 Double Precision units

72 GPGPU General Purpose Computing on GPUs

73 C/C++ GPGPU CUDA Stream Programming extended with: Kernels - function executed N times in parallel CPU/GPU Synchronization GPU Memory Management

74 GPGPU CUDA Stream Programming

75 Example: Vector Addition Kernel Device Code!!"#$%&'()"*)+($,"-'%"#"."/01!!"23+4"(4,)35"&),6$,%-"$7)"&38,9:8-)"3558(8$7 ;;<=$>3=;; C 87( 8."(4,)35D5EFE #J8K"."/J8K"0"1J8KI L 87( %387?B C!!"M'7"<,85"$6"N!OPQ">=$+G-"$6"OPQ"(4,)35-")3+4 *)+/55RRR"N!OPQA"OPQSSS?5;/A"5;1A"5;#BI L

76 Example: Vector Addition Kernel!!"#$%&'()"*)+($,"-'%"#"."/01!!"23+4"(4,)35"&),6$,%-"$7)"&38,9:8-)"3558(8$7 ;;<=$>3=;; C 87("8"."(4,)35D5EFE #J8K"."/J8K"0"1J8KI L 87("%387?B C Host Code!!"M'7"<,85"$6"N!OPQ">=$+G-"$6"OPQ"(4,)35-")3+4 *)+/55RRR"N!OPQA"OPQSSS?5;/A"5;1A"5;#BI L

77

78 Kernel Variations and Output global void kernel( int *a ) { int idx = blockidx.x*blockdim.x + threadidx.x; a[idx] = 7; } Output: global void kernel( int *a ) { int idx = blockidx.x*blockdim.x + threadidx.x; a[idx] = blockidx.x; } Output: global void kernel( int *a ) { int idx = blockidx.x*blockdim.x + threadidx.x; a[idx] = threadidx.x; } Output:

79 Example: Shuffling Data!!"#$%&'$&"()*+$,"-),$'"%."/$0,!!"1)23"43&$)'"5%($,"%.$"$*$5$.4 667*%-)*66 J A"43&$)'B'CDC E"-*%2/F85DC ;"-*%2/B'CDCG.$>6)&&)0H8I"A"<&$(6)&&)0H8.'82$,H8IIG Host Code 8.4 J

80 Need GPGPU Alternatives OpenCL Attempts to be OpenGL for GPGPU Almost identical to CUDA for a higher level languages Jacket for MATLAB PyCUDA

81 The Future Massively Multi-Core Processors

82 The Future MultiCore is Dead You have an array of six-core CPUs in your rack? You re going to feel pretty stupid when all the cool admins start popping eight-core chips.

83 Number Cannot The Future Many-Core? of cores so large that: Traditional caching models donʼt work keep coherent cache Network on a chip?

84 Havenʼt Programming The Future Memory Models & More even touched it Coherent and Uncoherent caches Uniform vs Non-Uniform Memory Access? TM? Special Purpose Hardware? Schedulers? Languages

85 Up 30% Similar The Future Intel Nehalem to 12 cores lower power usage programming abstractions: 12 cores, each with 128-bit wide SIMD units (SSE) Intel SandyBridge 256-bit wide SIMD units (AVX)

86 Putting Graphics The Future Parallelism Everywhere all those transistors to use Many ALUs Many Cores Intricate Cache Hierarchies Very difficult to program is way ahead of the game

87 The The Future Learn More: Landscape of Parallel Computing Research

88 Kayvon Mike Jared Thank you! Acknowledgements Fatahalian Many of these slides are inspired by or copied from him Houston CS448s Beyond Programmable Shading Hobernock & David Tarjan CS193g Programming massively parallel processors (I TAʼd this last quarter)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) From Shader Code to a Teraflop: How GPU Shader Cores Work Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) 1 This talk Three major ideas that make GPU processing cores run fast Closer look at real

More information

From Shader Code to a Teraflop: How Shader Cores Work

From Shader Code to a Teraflop: How Shader Cores Work From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University This talk 1. Three major ideas that make GPU processing cores run fast 2. Closer look at real GPU designs NVIDIA

More information

GPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)

GPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD) GPU Architecture Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009 www.gpgpu.org/ppam2009

More information

Real-Time Rendering Architectures

Real-Time Rendering Architectures Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand

More information

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 7: The Programmable GPU Core Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today A brief history of GPU programmability Throughput processing core 101 A detailed

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Antonio R. Miele Marco D. Santambrogio

Antonio R. Miele Marco D. Santambrogio Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Scientific Computing on GPUs: GPU Architecture Overview

Scientific Computing on GPUs: GPU Architecture Overview Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11

More information

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know. Administrivia HW0 scores, HW1 peer-review assignments out. HW2 out, due Nov. 2. If you re having Cython trouble with HW2, let us know. Review on Wednesday: Post questions on Piazza Introduction to GPUs

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

Parallel Programming for Graphics

Parallel Programming for Graphics Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1 Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008

More information

Graphics and Imaging Architectures

Graphics and Imaging Architectures Graphics and Imaging Architectures Kayvon Fatahalian http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/ About Kayvon New faculty, just arrived from Stanford Dissertation: Evolving real-time graphics

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

CONSOLE ARCHITECTURE

CONSOLE ARCHITECTURE CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

Accelerating image registration on GPUs

Accelerating image registration on GPUs Accelerating image registration on GPUs Harald Köstler, Sunil Ramgopal Tatavarty SIAM Conference on Imaging Science (IS10) 13.4.2010 Contents Motivation: Image registration with FAIR GPU Programming Combining

More information

GPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017!

GPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017! Advanced Topics on Heterogeneous System Architectures GPU! Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Introduction!

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

Introduction to CUDA (1 of n*)

Introduction to CUDA (1 of n*) Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements

More information

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU. Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex

More information

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Multi-Processors and GPU

Multi-Processors and GPU Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock

More information

Efficient and Scalable Shading for Many Lights

Efficient and Scalable Shading for Many Lights Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

Mattan Erez. The University of Texas at Austin

Mattan Erez. The University of Texas at Austin EE382V (17325): Principles in Computer Architecture Parallelism and Locality Fall 2007 Lecture 12 GPU Architecture (NVIDIA G80) Mattan Erez The University of Texas at Austin Outline 3D graphics recap and

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs

More information

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include 3.1 Overview Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include GPUs (Graphics Processing Units) AMD/ATI

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

GPU Architecture and Function. Michael Foster and Ian Frasch

GPU Architecture and Function. Michael Foster and Ian Frasch GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance

More information

ECE 574 Cluster Computing Lecture 16

ECE 574 Cluster Computing Lecture 16 ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics

More information

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27

GPU Programming. Lecture 1: Introduction. Miaoqing Huang University of Arkansas 1 / 27 1 / 27 GPU Programming Lecture 1: Introduction Miaoqing Huang University of Arkansas 2 / 27 Outline Course Introduction GPUs as Parallel Computers Trend and Design Philosophies Programming and Execution

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

GPU for HPC. October 2010

GPU for HPC. October 2010 GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,

More information

GPU A rchitectures Architectures Patrick Neill May

GPU A rchitectures Architectures Patrick Neill May GPU Architectures Patrick Neill May 30, 2014 Outline CPU versus GPU CUDA GPU Why are they different? Terminology Kepler/Maxwell Graphics Tiled deferred rendering Opportunities What skills you should know

More information

GPGPU introduction and network applications. PacketShaders, SSLShader

GPGPU introduction and network applications. PacketShaders, SSLShader GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU)

Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Introduction to Parallel Programming For Real-Time Graphics (CPU + GPU) Aaron Lefohn, Intel / University of Washington Mike Houston, AMD / Stanford 1 What s In This Talk? Overview of parallel programming

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

GRAPHICS PROCESSING UNITS

GRAPHICS PROCESSING UNITS GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing GPGPU general-purpose

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline

Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline Keeping Many s Busy: Scheduling the Graphics Pipeline Jonathan Ragan-Kelley, MIT CSAIL 29 July 2010 This talk How to think about scheduling GPU-style pipelines Four constraints which drive scheduling decisions

More information

Anatomy of AMD s TeraScale Graphics Engine

Anatomy of AMD s TeraScale Graphics Engine Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream

More information

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis Real-Time Reyes-Style Adaptive Surface Subdivision Anjul Patney and John D. Owens SIGGRAPH Asia

More information

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.

Pat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display. How Powerful are GPUs? Pat Hanrahan Computer Science Department Stanford University Computer Forum 2007 Modern Graphics Pipeline Application Command Geometry Rasterization Texture Fragment Display Page

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 20

ECE 571 Advanced Microprocessor-Based Design Lecture 20 ECE 571 Advanced Microprocessor-Based Design Lecture 20 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 12 April 2016 Project/HW Reminder Homework #9 was posted 1 Raspberry Pi

More information

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC

GPGPUs in HPC. VILLE TIMONEN Åbo Akademi University CSC GPGPUs in HPC VILLE TIMONEN Åbo Akademi University 2.11.2010 @ CSC Content Background How do GPUs pull off higher throughput Typical architecture Current situation & the future GPGPU languages A tale of

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

From Brook to CUDA. GPU Technology Conference

From Brook to CUDA. GPU Technology Conference From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i

More information

A Data-Parallel Genealogy: The GPU Family Tree

A Data-Parallel Genealogy: The GPU Family Tree A Data-Parallel Genealogy: The GPU Family Tree Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Outline Moore s Law brings

More information

GPU Architecture. Samuli Laine NVIDIA Research

GPU Architecture. Samuli Laine NVIDIA Research GPU Architecture Samuli Laine NVIDIA Research Today The graphics pipeline: Evolution of the GPU Throughput-optimized parallel processor design I.e., the GPU Contrast with latency-optimized (CPU-like) design

More information

CE 431 Parallel Computer Architecture Spring Graphics Processor Units (GPU) Architecture

CE 431 Parallel Computer Architecture Spring Graphics Processor Units (GPU) Architecture CE 431 Parallel Computer Architecture Spring 2017 Graphics Processor Units (GPU) Architecture Nikos Bellas Computer and Communications Engineering Department University of Thessaly Some slides borrowed

More information

Real-time Graphics 9. GPGPU

Real-time Graphics 9. GPGPU Real-time Graphics 9. GPGPU GPGPU GPU (Graphics Processing Unit) Flexible and powerful processor Programmability, precision, power Parallel processing CPU Increasing number of cores Parallel processing

More information

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University Part 1: General introduction Ch. Hoelbling Wuppertal University Lattice Practices 2011 Outline 1 Motivation 2 Hardware Overview History Present Capabilities 3 Programming model Past: OpenGL Present: CUDA

More information

Numerical Simulation on the GPU

Numerical Simulation on the GPU Numerical Simulation on the GPU Roadmap Part 1: GPU architecture and programming concepts Part 2: An introduction to GPU programming using CUDA Part 3: Numerical simulation techniques (grid and particle

More information

Scanline Rendering 2 1/42

Scanline Rendering 2 1/42 Scanline Rendering 2 1/42 Review 1. Set up a Camera the viewing frustum has near and far clipping planes 2. Create some Geometry made out of triangles 3. Place the geometry in the scene using Transforms

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 35 Course outline Introduction to GPU hardware

More information

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

Compute-mode GPU Programming Interfaces

Compute-mode GPU Programming Interfaces Lecture 8: Compute-mode GPU Programming Interfaces Visual Computing Systems What is a programming model? Programming models impose structure A programming model provides a set of primitives/abstractions

More information

Josef Pelikán, Jan Horáček CGG MFF UK Praha

Josef Pelikán, Jan Horáček CGG MFF UK Praha GPGPU and CUDA 2012-2018 Josef Pelikán, Jan Horáček CGG MFF UK Praha pepca@cgg.mff.cuni.cz http://cgg.mff.cuni.cz/~pepca/ 1 / 41 Content advances in hardware multi-core vs. many-core general computing

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

Scheduling the Graphics Pipeline on a GPU

Scheduling the Graphics Pipeline on a GPU Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Today Real-time 3D graphics workload metrics Scheduling the graphics pipeline on a modern GPU Quick aside: tessellation Triangle

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices Hearn, Baker, Carithers Raster Display Transmissive vs. Emissive Display anode

More information

2.11 Particle Systems

2.11 Particle Systems 2.11 Particle Systems 320491: Advanced Graphics - Chapter 2 152 Particle Systems Lagrangian method not mesh-based set of particles to model time-dependent phenomena such as snow fire smoke 320491: Advanced

More information

Preparing seismic codes for GPUs and other

Preparing seismic codes for GPUs and other Preparing seismic codes for GPUs and other many-core architectures Paulius Micikevicius Developer Technology Engineer, NVIDIA 2010 SEG Post-convention Workshop (W-3) High Performance Implementations of

More information