GPU Architecture. Michael Doggett Department of Computer Science Lund university
|
|
- Claribel Simmons
- 6 years ago
- Views:
Transcription
1 GPU Architecture Michael Doggett Department of Computer Science Lund university
2 GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770
3 Let s start at the beginning...
4 Graphics Hardware before GPUs 1970s s Very Expensive Real-time Rendering needs a lot of compute power Military, large industrial, flight simulators Evans and Sutherland founded 1968 Custom built systems, multiple boards Custom and off-the-shelf ASICs Today s GPU has similar structure, but all on one chip
5 Pixel-Planes/Flow Research Systems University of North Carolina Pixel-Planes 5 [Fuchs89] PixelFlow [Molnar92] Programmable Shading [Lastra95] Final version built and sold by HP 97 Image courtesy [Lastra95]
6 Silicon Graphics Onyx MIPS R4400 RISC processors Reality Engine 92[Akeley93] Geometry Engine 50MHz Intel i860xp, Raster Memory, 1-4 OpenGL, no shaders Images courtesy [Akeley93]
7 Silicon Graphics Onyx Infinite Reality [Montrym97] Images courtesy [Montrym97]
8 Neon: A (Big) (Fast) Single-Chip 3D Workstation Graphics Accelerator Early GPU design (1999) with publication! 8 x 32bit memory controllers Commodity SDRAMs 2D tiled memory interleaving 350 nm, 17.3 x 17.3 mm, 6.8 million transistors, 100MHz
9 GPUs
10 Graphics Processing Unit GPU How to turn millions of triangles into pixels in 1/60 of a second? Parallelism!! Pipelining Single Instruction Multiple Data (SIMD) Multiple Instruction Multiple Data (MIMD) multiple cores Memory Bandwidth reductions (covered previously) Data 0 Data 1 Data 2 Data 3 1 Instruction PU PU PU PU 4-wide SIMD PU=processing unit
11 GPU design parameters Competition 2 strong competitors NVIDIA and AMD (can consider Intel a third) Performance/Dollar Moore s law Number of transistors on a chip doubles every two years APIs (OpenGL and DirectX) hide details Hide architectural changes Provide backward compatibility (mostly)
12 Before programmable GPUs ATI Founded D - Mach series 3D - Rage series nvidia Founded 1993 Riva, Riva TNT series up to DirectX 6 Textures Rasterization Texturing Z & Alpha Image FrameBuffer
13 Hardware Transform, Clipping and Lighting NV10 99 R Nvidia GeForce 256 The first GPU ATI Radeon 7500 Transformation requires 32bit float 4x4 matrix multiplication Texturing for 4 component (RGBA) 8bit pixels Low precision math DirectX 7 Vertices Textures Image Transform Rasterization Texturing Z & Alpha FrameBuffer
14 Programmable GPUs 1st Generation Shaders run programs that do what fixed function hardware did GPU programs (shaders) have simpler requirements than CPU programs Vertices Shaders Textures Shaders Image Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha FrameBuffer
15 Programmable GPUs 1st Generation NV20 01 Nvidia GeForce 3 [Lindholm01] R Vertices Shaders Vertex Transform shader Rasterization ATI Radeon wide Vertex shader 4-wide SIMD Pixel shader Textures Shaders Pixel Texturing shader Z & Alpha Fixed point ~16bits Image FrameBuffer
16 Programmable GPUs 1st Generation DirectX8 Pixel Shaders Multiple versions 1.1, 1.3, instructions assembler language Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Image FrameBuffer
17 Programmable GPUs 2nd Generation R ATI Radeon bit memory bus 4-wide Vertex shader 8-wide SIMD Pixel shader 24bit float Vertices Shaders Textures Shaders Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha Image FrameBuffer
18 Programmable GPUs 2nd Generation NV30 03 Nvidia GeForce FX and 32 bit float Cg (C for graphics) NV40 04 GeForce 6800 [Montrym05] DirectX 9 High Level Shading Language (HLSL) Vertices Shaders Textures Shaders Image Vertex Transform shader Rasterization Pixel Texturing shader Z & Alpha FrameBuffer
19 Unified Shaders
20 Unified Shader Vertex shader Rasterization Pixel shader Z & Alpha FrameBuffer
21 Unified Shader Grouper Rasterization Unified shader Z & Alpha FrameBuffer
22 Unified Shader A revolutionary step in Graphics Hardware One hardware design that performs both Vertex and Pixel shaders Vertex processing power Vertices Vertices Pixels Vertex Shader Unified Shader Pixels Pixel Shader 22
23 Unified Shader GPU based vertex and pixel load balancing Better vertex and pixel resource usage Union of features E.g. Control flow, indexable constant, All 32bit floating point DX9 Shader Model 3.0+ Shader write at address instruction Vertices Pixels 3 SIMDs Unified Shader 48 ALUs Vec4 Scalar 23
24 XBox360 GPU architecture - 05 GPU Primitive Setup Clipper Rasterizer Hierarchical Z/S Index Stream Generator Tessellator Vertex Pipeline Pixel Pipeline Display Pixels Unified Shader Texture/Vertex Fetch UNIFIED MEMORY Output Buffer Memory Export SIMD is 16 vector math units wide - Shared instr. decode and control - Makes GPUs simpler than CPUs DAUGHTER DIE 24
25 Rendering performance Z & Alpha 2 Dies (1 GPU, 1 Logic Embedded DRAM) GPU to Daughter Die high speed interface 8 pixels/clk 32BPP color, 4 samples/pixel Z - Lossless compression 16 pixels/clk Double Z 4 samples/pixel Z - Lossless compression GPU 32GB/s DAUGHTER DIE 25
26 Rendering performance Z & Alpha Z and Alpha logic to EDRAM interface 256GB/s Color and Z - 32 samples 32bit color, 24bit Z, 8bit stencil Double Z - 64 samples 24bit Z, 8bit stencil DAUGHTER DIE 8pix/clk, 4x MSAA, Stencil and Z test, Alpha blending 256GB/s 10MB EDRAM 26
27 2013 Xbox360 GTAV
28 Unified Shader Grouper Rasterization Unified shader Z & Alpha FrameBuffer
29 Unified Shaders - PC G80 06 (Tesla [Lindholm08]) nvidia GeForce Texture Processor Cluster (TPC) 2 Streaming Multiprocessor (MP) 8 Streaming processors (SP), Multiply Add (MAD) MHz Level 2 cache placed close to the individual memories (DRAMs) DirectX 10 Shader Shared Memory CUDA - new compute focused API
30 NVIDIA G80 Images courtesy [Lindholm08]
31 Unified Shaders - PC R ATI Radeon SIMDs 16 Stream Processing Units 5 Scalar Units, MAD MHz DirectX 10
32 R600 Top Level Red Compute Rasterizer Yellow Cache Unified shader Thread Scheduler Shader R/W Instr./Const. cache Unified texture cache Compression 4 SIMDs 16 Vector Units per SIMD 5 ALUs per Vector Unit Unified Texture Cache L2 256KB L1 32KB Z + Alpha 32 August 2007 Radeon HD 2900
33 Fast Thread Switching enables GPU arithmetic intensity 1000s of small simple threads with context on chip GPUs hide large memory request latency by switching between threads time (on one processor) 1 Read sleep ALU ops 2 Read ALU ops 3 Read ALU ops 4 Read ALU ops mmxii mcd
34 How many instructions in shaders? ATI Ruby Demos 1-3 DX9 4 DX10 34 instructions Slide courtesy Norm Rubin, AMD
35 R ATI Radeon 4870! 10 SIMDs SIMD = 16 x 5 ALU! New memory architecture L2 cache partitioned at MCs Crossbar to L1s! 1.2 TeraFLOPS 35
36 R770 Die! 260mm2 ->small! 956 MTransistors! ~1 Billion Transistors 36
37 R770 Die usage! Red 10 SIMDs! Orange 64 z/stencil 40 texture 37
38 DirectX 11 Direct3D 11 Tessellation Hull Shader Works on control points Tessellator - Fixed Function Generates new vertices Domain Shader Works on vertices Geometry Shader (DX10) Works on primitives (triangles) Compute Shader mmxi mcd
39 Multi-Graphics core nvidia GF100 (Fermi) GeForce GTX 480 Released March, x Triangle rate, 4 rasterization engines 4 GPC x 3-4 SM x 32 scalars = MHz Memory Hierarchy New L1-L2 cache for shader read/write L1 cache (per SM) is 16 or 48 KB 768KB L2 cache is coherent and shared with texture and other parts of the graphics pipeline mmxii mcd
40 Comparing GPU shaders NVIDIA GeForce GTX 480 SM ATI Radeon HD 5870 SIMD-engine Fetch/ Decode Fetch/ Decode Fetch/ Decode Execution contexts (128 KB) Shared memory (16+48 KB) Execution contexts (256 KB) Shared memory (32 KB) 15 Streaming Multiprocessors 20 SIMD-engines mmxii mcd Images courtesy Kayvon Fatahalian
41 Comparing CPUs and GPUs Core Core Core Core CPU L2 L2 L2 L2 Shared L3 cache GPU cache cache cache cache cache cache cache cache
42 GPUs in Nvidia Volta architecture, 12nm Nvidia Tesla V100 (GV100) 15 TeraFLOPS (32bit FP) AMD Radeon Pro Vega Consoles Microsoft Xbox One X (AMD) Sony Playstation Pro (AMD) Nintendo Switch Nvidia Tegra Mobile ARM Mali Bifrost Architecture Intel discreet GPUs coming mmxvi mcd
43 Next 5-10 years Continued upgrade cycle for GPUs? Importance of parallel programming languages CUDA, OpenCL, GPU Compute killer app is here AI/Deep Learning! Nvidia Volta - Tensor Cores (FP16) mmxvi mcd
44 Summary GPU shaders grew from low precision simple programs GPU ALU is much simpler than CPUs SIMD allows less hardware for many instructions GPUs hide memory latency with 1000s of threads GPUs are massively parallel devices and can be used for general compute (GPGPU)
45 Next... Project Description Due Thursday Thursday Graphics architecture and OpenCL Next week - Tuesday - Guest Speaker Johan Grönqvist, ARM
46 References Henry Fuchs et. al. "A Heterogeneous Multiprocessor Graphics System Using Processor- Enhanced Memories," Proceedings of SIGGRAPH '89 Steven Molnar et. al. PixelFlow: high-speed rendering using image composition, Proceedings of SIGGRAPH 92 Kurt Akeley, "RealityEngine Graphics". Proceedings of SIGGRAPH '93 John Montrym et. al. InfiniteReality: a real-time graphics system, Proceedings of SIGGRAPH 97 Joel McCormack, Neon: A (Big) (Fast) Single-Chip 3D Workstation Graphics Accelerator WRL Research Report 98/1 (Revised 99) Lindholm et. al., A user-programmable vertex engine, Proceedings of SIGGRAPH 01 Montrym et. al. The GeForce 6800, IEEE Micro, March 05 Lindholm et. al. Nvidia tesla: A unified graphics and computing architecture, IEEE Micro, March- April 08 Larry Seiler et. al. Larrabee: A Many-Core x86 Architecture for Visual Computing, Proceedings of SIGGRAPH 08 Jeremy Sugerman et. al. GRAMPS: A Programming Model for Graphics Pipelines, ACM Transactions on Graphics January, 09 Timo Aila et. al. Understanding the Efficiency of Ray Traversal on GPUs. High-Performance Graphics 2009.
Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationArchitectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1
Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics
More informationCS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8 Markus Hadwiger, KAUST Reading Assignment #5 (until March 12) Read (required): Programming Massively Parallel Processors book, Chapter
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationSpring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett
Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationAntonio R. Miele Marco D. Santambrogio
Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationFrom Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)
From Shader Code to a Teraflop: How GPU Shader Cores Work Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian) 1 This talk Three major ideas that make GPU processing cores run fast Closer look at real
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationWhat Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others
What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.
More informationCurrent Trends in Computer Graphics Hardware
Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)
More informationIntroduction to Multicore architecture. Tao Zhang Oct. 21, 2010
Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationFrom Shader Code to a Teraflop: How Shader Cores Work
From Shader Code to a Teraflop: How Shader Cores Work Kayvon Fatahalian Stanford University This talk 1. Three major ideas that make GPU processing cores run fast 2. Closer look at real GPU designs NVIDIA
More informationGRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147
GRAPHICS HARDWARE Niels Joubert, 4th August 2010, CS147 Rendering Latest GPGPU Today Enabling Real Time Graphics Pipeline History Architecture Programming RENDERING PIPELINE Real-Time Graphics Vertices
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationGraphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics
Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high
More informationIntroduction to CUDA (1 of n*)
Agenda Introduction to CUDA (1 of n*) GPU architecture review CUDA First of two or three dedicated classes Joseph Kider University of Pennsylvania CIS 565 - Spring 2011 * Where n is 2 or 3 Acknowledgements
More informationX. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1
X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores
More informationIntroduction to Modern GPU Hardware
The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This
More informationCornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary
Cornell University CS 569: Interactive Computer Graphics Introduction Lecture 1 [John C. Stone, UIUC] 2008 Steve Marschner 1 2008 Steve Marschner 2 NASA University of Calgary 2008 Steve Marschner 3 2008
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationThe NVIDIA GeForce 8800 GPU
The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline
More informationCONSOLE ARCHITECTURE
CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationThis Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?
This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital
More informationGraphics and Imaging Architectures
Graphics and Imaging Architectures Kayvon Fatahalian http://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/ About Kayvon New faculty, just arrived from Stanford Dissertation: Evolving real-time graphics
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex
More informationUnit 11: Putting it All Together: Anatomy of the XBox 360 Game Console
Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture
More informationReal - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský
Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation
More informationReal-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university
Real-Time Buffer Compression Michael Doggett Department of Computer Science Lund university Project 3D graphics project Demo, Game Implement 3D graphics algorithm(s) C++/OpenGL(Lab2)/iOS/android/3D engine
More informationThis Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources
This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits
More informationReal-Time Rendering Architectures
Real-Time Rendering Architectures Mike Houston, AMD Part 1: throughput processing Three key concepts behind how modern GPU processing cores run code Knowing these concepts will help you: 1. Understand
More informationLecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)
Lecture 7: The Programmable GPU Core Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today A brief history of GPU programmability Throughput processing core 101 A detailed
More informationLecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)
Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications
More informationGPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)
GPU Architecture Robert Strzodka (MPII), Dominik Göddeke G (TUDo( TUDo), Dominik Behr (AMD) Conference on Parallel Processing and Applied Mathematics Wroclaw, Poland, September 13-16, 16, 2009 www.gpgpu.org/ppam2009
More informationWhat s New with GPGPU?
What s New with GPGPU? John Owens Assistant Professor, Electrical and Computer Engineering Institute for Data Analysis and Visualization University of California, Davis Microprocessor Scaling is Slowing
More informationParallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)
Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics
More informationGraphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal
Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High
More informationCS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST
CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable
More informationAMD Radeon HD 2900 Highlights
C O N F I D E N T I A L 2007 Hot Chips 19 AMD s Radeon HD 2900 2 nd Generation Unified Shader Architecture Mike Mantor Fellow AMD Graphics Products Group michael.mantor@amd.com AMD Radeon HD 2900 Highlights
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationGraphics Hardware. Instructor Stephen J. Guy
Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationGPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.
Basics of s Basics Introduction to Why vs CPU S. Sundar and Computing architecture August 9, 2014 1 / 70 Outline Basics of s Why vs CPU Computing architecture 1 2 3 of s 4 5 Why 6 vs CPU 7 Computing 8
More informationGPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017!
Advanced Topics on Heterogeneous System Architectures GPU! Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Introduction!
More informationProf. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University Project3 Cache Race Games night Monday, May 4 th, 5pm Come, eat, drink, have fun and be merry! Location: B17 Upson Hall
More informationReal-Time Rendering (Echtzeitgraphik) Michael Wimmer
Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key
More information1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.
1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Structure of a Graphics Adapter Video Memory Graphics
More informationCS195V Week 9. GPU Architecture and Other Shading Languages
CS195V Week 9 GPU Architecture and Other Shading Languages GPU Architecture We will do a short overview of GPU hardware and architecture Relatively short journey into hardware, for more in depth information,
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationECE 574 Cluster Computing Lecture 16
ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics
More informationGPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS
GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS Agenda Forming a GPGPU WG 1 st meeting Future meetings Activities Forming a GPGPU WG To raise needs and enhance information sharing A platform for knowledge
More informationGRAPHICS PROCESSING UNITS
GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011
More informationMulti-Processors and GPU
Multi-Processors and GPU Philipp Koehn 7 December 2016 Predicted CPU Clock Speed 1 Clock speed 1971: 740 khz, 2016: 28.7 GHz Source: Horowitz "The Singularity is Near" (2005) Actual CPU Clock Speed 2 Clock
More informationScheduling the Graphics Pipeline on a GPU
Lecture 20: Scheduling the Graphics Pipeline on a GPU Visual Computing Systems Today Real-time 3D graphics workload metrics Scheduling the graphics pipeline on a modern GPU Quick aside: tessellation Triangle
More informationBeyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline
Keeping Many s Busy: Scheduling the Graphics Pipeline Jonathan Ragan-Kelley, MIT CSAIL 29 July 2010 This talk How to think about scheduling GPU-style pipelines Four constraints which drive scheduling decisions
More informationGPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS
GPU-Based Volume Rendering of João L. D. Comba Cláudio T. Silva Steven P. Callahan Unstructured Grids UFRGS University of Utah University of Utah Fábio F. Bernardon UFRGS Natal - RN - Brazil XVIII Brazilian
More informationAccelerating CFD with Graphics Hardware
Accelerating CFD with Graphics Hardware Graham Pullan (Whittle Laboratory, Cambridge University) 16 March 2009 Today Motivation CPUs and GPUs Programming NVIDIA GPUs with CUDA Application to turbomachinery
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More informationGPU for HPC. October 2010
GPU for HPC Simone Melchionna Jonas Latt Francis Lapique October 2010 EPFL/ EDMX EPFL/EDMX EPFL/DIT simone.melchionna@epfl.ch jonas.latt@epfl.ch francis.lapique@epfl.ch 1 Moore s law: in the old days,
More informationOptimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic
More informationScientific Computing on GPUs: GPU Architecture Overview
Scientific Computing on GPUs: GPU Architecture Overview Dominik Göddeke, Jakub Kurzak, Jan-Philipp Weiß, André Heidekrüger and Tim Schröder PPAM 2011 Tutorial Toruń, Poland, September 11 http://gpgpu.org/ppam11
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationGPU Architecture and Function. Michael Foster and Ian Frasch
GPU Architecture and Function Michael Foster and Ian Frasch Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance
More informationA Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis
A Data-Parallel Genealogy: The GPU Family Tree John Owens University of California, Davis Outline Moore s Law brings opportunity Gains in performance and capabilities. What has 20+ years of development
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationAnatomy of AMD s TeraScale Graphics Engine
Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream
More informationPat Hanrahan. Modern Graphics Pipeline. How Powerful are GPUs? Application. Command. Geometry. Rasterization. Fragment. Display.
How Powerful are GPUs? Pat Hanrahan Computer Science Department Stanford University Computer Forum 2007 Modern Graphics Pipeline Application Command Geometry Rasterization Texture Fragment Display Page
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 12
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationBy: Tomer Morad Based on: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym. NVIDIA TESLA: A UNIFIED GRAPHICS AND COMPUTING ARCHITECTURE In IEEE Micro 28(2), 2008 } } Erik Lindholm, John Nickolls,
More informationComputer Architecture 计算机体系结构. Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU. Chao Li, PhD. 李超博士
Computer Architecture 计算机体系结构 Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Thread, Multithreading, SMT CMP and multicore Benefits of
More informationReal-Time Graphics Architecture
Real-Time Graphics Architecture Lecture 4: Parallelism and Communication Kurt Akeley Pat Hanrahan http://graphics.stanford.edu/cs448-07-spring/ Topics 1. Frame buffers 2. Types of parallelism 3. Communication
More informationParallel Programming for Graphics
Beyond Programmable Shading Course ACM SIGGRAPH 2010 Parallel Programming for Graphics Aaron Lefohn Advanced Rendering Technology (ART) Intel What s In This Talk? Overview of parallel programming models
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationCS 316: Multicore/GPUs
CS 316: Multicore/GPUs Kavita Bala Fall 2007 Computer Science Cornell University Announcements Core Wars will be out in the next couple of days Aim at having fun! Number of points allocated to it is small
More informationBeyond Programmable Shading Course ACM SIGGRAPH 2010 Panel: Fixed-Function Hardware
Beyond Programmable Shading Course ACM SIGGRAPH 2010 Panel: Fixed-Function Hardware What role will it play in future graphics architectures? Fixed-Function Hardware GPU programmers love to hate it Its
More informationShaders. Slide credit to Prof. Zwicker
Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?
More information! Readings! ! Room-level, on-chip! vs.!
1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads
More informationSqueezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques
Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI
More informationCSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs
More informationProgrammable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1
Programmable GPUs Real Time Graphics Virtua Fighter 1995 (SEGA Corporation) NV1 Dead or Alive 3 2001 (Tecmo Corporation) Xbox (NV2A) Nalu 2004 (NVIDIA Corporation) GeForce 6 Human Head 2006 (NVIDIA Corporation)
More informationGPU Target Applications
John Montrym Henry Moreton GPU Target Applications (Graphics Processing Unit) 1 Interactive Gaming (50M units, 10M gamers) Cinematic quality rendering in real time. Digital Content Creation (DCC) (1M prof,
More informationGPU Computation Strategies & Tricks. Ian Buck NVIDIA
GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit
More information3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center
3D Computer Games Technology and History VRVis Research Center Lecture Outline Overview of the last ten years A look at seminal 3D computer games Most important techniques employed Graphics research and
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin
EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel
More informationXbox 360 Architecture. Lennard Streat Samuel Echefu
Xbox 360 Architecture Lennard Streat Samuel Echefu Overview Introduction Hardware Overview CPU Architecture GPU Architecture Comparison Against Competing Technologies Implications of Technology Introduction
More informationGPUs and GPGPUs. Greg Blanton John T. Lubia
GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware
More informationXbox 360 high-level architecture
11/2/11 Xbox 360 s Xenon vs. Playstation 3 s Cell Both chips clocked at a 3.2 GHz Architectural Comparison: Xbox 360 vs. Playstation 3 Prof. Aaron Lanterman School of Electrical and Computer Engineering
More informationIntroduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono
Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied
More informationCMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN
CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN Graphics Processing Unit Accelerate the creation of images in a frame buffer intended for the output
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationNumerical Simulation on the GPU
Numerical Simulation on the GPU Roadmap Part 1: GPU architecture and programming concepts Part 2: An introduction to GPU programming using CUDA Part 3: Numerical simulation techniques (grid and particle
More information