GPU Architecture and Function. Michael Foster and Ian Frasch

Similar documents
Spring 2011 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Instructor Stephen J. Guy

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Graphics Processing Unit Architecture (GPU Arch)

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CS427 Multicore Architecture and Parallel Computing

Programming Graphics Hardware

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Antonio R. Miele Marco D. Santambrogio

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Scanline Rendering 2 1/42

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

Introduction to Modern GPU Hardware

Graphics and Imaging Architectures

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

The Rasterization Pipeline

Rendering Objects. Need to transform all geometry then

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Current Trends in Computer Graphics Hardware

Shaders. Slide credit to Prof. Zwicker

Windowing System on a 3D Pipeline. February 2005

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

GPGPU. Peter Laurens 1st-year PhD Student, NSC

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Portland State University ECE 588/688. Graphics Processors

PowerVR Hardware. Architecture Overview for Developers

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

A Reconfigurable Architecture for Load-Balanced Rendering

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Rasterization Overview

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Efficient and Scalable Shading for Many Lights

! Readings! ! Room-level, on-chip! vs.!

GPUs and GPGPUs. Greg Blanton John T. Lubia

2.11 Particle Systems

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

Threading Hardware in G80

EECS 487: Interactive Computer Graphics

Lecture 25: Board Notes: Threads and GPUs

Hardware-driven visibility culling

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

Anatomy of AMD s TeraScale Graphics Engine

GPU ARCHITECTURE Chris Schultz, June 2017

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

General-Purpose Computation on Graphics Hardware

Advanced Computer Graphics (CS & SE ) Lecture 7

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

The Application Stage. The Game Loop, Resource Management and Renderer Design

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

THE GRAPHICS PIPELINE

The GPGPU Programming Model

Real-Time Rendering Architectures

Could you make the XNA functions yourself?

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

CS130 : Computer Graphics Lecture 2: Graphics Pipeline. Tamar Shinar Computer Science & Engineering UC Riverside

Course Recap + 3D Graphics on Mobile GPUs

CME 213 S PRING Eric Darve

Power Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller

GPU ARCHITECTURE Chris Schultz, June 2017

Rendering Grass with Instancing in DirectX* 10

Ciril Bohak. - INTRODUCTION TO WEBGL

ASYNCHRONOUS SHADERS WHITE PAPER 0

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

Scheduling the Graphics Pipeline on a GPU

CMPE 665:Multiple Processor Systems CUDA-AWARE MPI VIGNESH GOVINDARAJULU KOTHANDAPANI RANJITH MURUGESAN

CS451Real-time Rendering Pipeline

PowerVR Series5. Architecture Guide for Developers

3D buzzwords. Adding programmability to the pipeline 6/7/16. Bandwidth Gravity of modern computer systems

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

7/29/2010 Beyond Programmable Shading Course, ACM SIGGRAPH 2010

From Shader Code to a Teraflop: How Shader Cores Work

Using Graphics Chips for General Purpose Computation

The Rasterization Pipeline

GPGPU introduction and network applications. PacketShaders, SSLShader

From Brook to CUDA. GPU Technology Conference

Parallel Programming for Graphics

GPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017!

Mattan Erez. The University of Texas at Austin

Transcription:

GPU Architecture and Function Michael Foster and Ian Frasch

Overview What is a GPU? How is a GPU different from a CPU? The graphics pipeline History of the GPU GPU architecture Optimizations GPU performance trends Current development

What is a GPU? Dedicated graphics chip that handles all processing required for rendering 3D objects on the screen Typically placed on a video card, which contains its own memory and display interfaces (HDMI, DVI, VGA, etc) Primitive GPUs were developed in the 1980s, although the first complete GPUs began in the mid 1990s.

Systems level view Video card connected to motherboard through PCI- Express or AGP (Accelerated Graphics Port) Northbridge chip enables data transfer between the CPU and GPU Graphics memory on the video card contains the pixel RGB data for each frame

How is a GPU different from a CPU? Throughput more important than latency o o High throughput needed for the huge amount of computations required for graphics Not concerned about latency because human visual system operates on a much longer time scale 16 ms maximum latency at 60 Hz refresh rate Long pipelines with many stages; a single instruction may thousands of cycles to get through the pipeline. Latency

How is a GPU different from a CPU? Extremely parallel o o Different pixels and elements of the image can be operated on independently Hundreds of cores executing at the same time to take advantage of this fundamental parallelism

Inputs and Outputs Inputs to GPU (from the CPU/memory): Vertices (3D coordinates) of objects Texture data Lighting data Outputs from GPU: Frame buffer o Placed in a specific section of graphics memory o Contains RGB values for each pixel on the screen o Data is sent directly to display

The Graphics Pipeline: A Visual 3D coordinates /Pixel Shader

The Graphics Pipeline The GPU completes every stage of this computational pipeline

Transformations Camera transformation o Convert vertices from 3D world coordinates to 3D camera coordinates, with the camera (user view) as the origin Projection transformation o Convert vertices from 3D camera coordinates to 2D screen view coordinates that the user will see

Illustration of 3D-2D Projection (With overlapping vertices) Depth values in Z buffer determine which triangle will be visible if two vertices map to the same 2D coordinate

Transformations These transformations simply modify vertices, so they are done by vertex shaders Transform computations are heavy on matrix multiplication Each vertex can be transformed independently o Data Parallelism

Example renderings Vertices (Point-cloud) Primitives (Triangles) Textures

More renderings Simple shaded 6-vertex cube; each vertex has a color associated with it, the pixel shader blends the colors. Advanced rendering

Fixed-Function to Programmable Earlier GPUs were fixed-function hardware pipelines o Software developers could set parameters (textures, light reflection colors, blend modes) but the function was completely controlled by the hardware In newer GPUs, portions of the pipeline are completely programmable o o o Pipeline stages are now programs running on processor cores inside the GPU, instead of fixed-function ASICs Vertex shaders = programs running on vertex processors, fragment shaders = programs running on fragment processors However, some stages are still fixed function (e.g. rasterization)

History of the GPU 1996: 1999: 2001: 2006: 2010: 2014: 3DFX Voodoo graphics card implements texture mapping, z- buffering, and rasterization, but no vertex processing GPUs implement the full graphics pipeline in fixed-function hardware (Nvidia GeForce 256, ATI Radeon 7500) Programmable shader pipelines (Nvidia Geforce 3) Unified shader architecture (ATI Radeon R600, Nvidia Geforce 8, Intel GMA X3000, ATI Xenos for Xbox360) General Purpose GPUs for non-graphical compute-intensive applications, Nvidia CUDA parallel programming API Unprecedented compute power Nvidia Geforce GTX Titan Z - 8.2 TFLOPS AMD Radeon R9 295X2 (dual GPU card) - 11.5 TFLOPS

GPU Architecture (programmable) Nvidia Geforce 6 series (2004) (programmable)

Shader processors Vertex Shader core Fragment/Pixel Shader core

Single Instruction, Multiple Data (SIMD) Shader processors are generally SIMD A single instruction executed on every vertex or pixel

Functional block diagram

Optimizations Combining different types of shader cores into a single unified shader core Dynamic task scheduling to balance the load on all cores

Workload Distribution - Frames with many edges (vertices) require more vertex shaders - Frames with large primitives require more pixel shaders

Solution: Unified Shader Pixel shaders, geometry shaders, and vertex shaders run on the same core - a unified shader core o o o Unified shaders limit idle shader cores Instruction set shared across all shader types Program determines type of shader Modern GPUs all use unified shader cores Shader cores are programmed using graphics APIs like OpenGL and Direct3D

Static Task Distribution - Unequal task distribution leads to inefficient hardware usage - Parallel processors should handle tasks of equal complexity

Dynamic Task Distribution - Tasks are dynamically distributed among pixel shaders - Slots for output are pre-allocated in output FIFO

Modern GPU Hardware Color Key Basic blocks in a modern GPU

Unified Architecture example Nvidia Geforce 8 (2006)

Unified Architecture example AMD Radeon R600 (2006)

GPU Compute Power: Recent History

GPU Memory Bandwidth: Recent History

Current Development and Future - GPU fixed-function units are being abstracted away - Newest versions of CUDA and OpenGL include instructions for general-purpose computing - Future GPUs will resemble multi-core CPUs with hyper-threading

Nvidia Fermi (2010) - 16-cores with 32-way hyper-threading per core - 1.5 TFLOPS (peak)

Sources D. Hower. (2013, May 21). GPU Architectures: A CPU Perspective. [Online]. Available: http://courses.cs.washington.edu/courses/cse471/13sp/lectures/gpusstudents.pdf C. M. Wittenbrink, E. Kilgariff, and A. Prabhu. (2011, April). IEEE Micro. [Online]. 31(2), pp. 50-59. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5751939 D. Luebke and G. Humphreys, How GPUs Work, IEEE Computer. [Online]. vol. 40, no. 2, pp. 96-100, Feb. 2007. Available: http://www.cs.virginia.edu/~gfx/papers/pdfs/59_howthingswork.pdf B. Mederos, L. Velho, and L. H. de Figueiredo, Moving least squares multiresolution surface approximation, Proc. XVI Brazilian Symp. Computer Graphics and Image Processing, [Online]. Oct. 2003, pp 19-26. Available: http://w3.impa.br/~boris/mederosb_moving.pdf K. Hagen. (2014, July 23). Introduction to Real-Time Rendering. [Online]. Available: http://www.slideshare.net/korayhagen/introduction-to-realtime-rendering G. Turk. (2000, Aug.). The Stanford Bunny, [Online]. Available: http://www.cc.gatech.edu/~turk/bunny/bunny.html.old01 Drawing Polygons, OpenGL. [Online]. Available: https://open.gl/drawing K. Fatahalian. (2011). How a GPU Works. [Online]. Available: http://www.cs.cmu.edu/afs/cs/academic/class/15462- f11/www/lec_slides/lec19.pdf D. Luebke. (2007). GPU Architecture: Implications & Trends. [Online]. Available: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Sources (continued) M. Houston and A. Lefohn. (2011). GPU architecture II: Scheduling the graphics pipeline. [Online]. Available: https://courses.cs.washington.edu/courses/cse558/11wi/lectures/08-gpu-architecture-ii_bps-2011.pdf J. Ragan-Kelley. (2010, July 29). Keeping Many Cores Busy: Scheduling the Graphics Pipeline. [Online]. Available: http://bps10.idav.ucdavis.edu/talks/09-ragankelley_schedulingrenderingpipeline_bps_siggraph2010.pdf B. C. Johnstone, Bandwidth Requirements of GPU Architectures, M. S. thesis, Dept. Comp. Eng., Rochester Institute of Technology, Rochester, NY, 2014. J. D. Owens, M. Houston, D. Luebke, and S. Green, (2008). GPU Computing, Proc. IEEE, vol. 96, no. 5, pp. 879-899, [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4490127&tag=1 T. Dalling. (2014, Feb. 24). Explaining Homogeneous Coordinates & Projective Geometry, [Online]. Available: http://www.tomdalling.com/blog/modern-opengl/explaining-homogenous-coordinates-and-projective-geometry/ NVIDIA, (2009). "Whitepaper: NVIDIA's next generation CUDA compute architecture: Fermi," [Online]. Available: http://www.nvidia.com/content/pdf/fermi_white_papers/nvidiafermicomputearchitecturewhitepaper.pdf C. McClanahan. (2010). History and Evolution of GPU Architecture: A Paper Survey. [Online]. Available: http://mcclanahoochie.com/blog/wpcontent/uploads/2011/03/gpu-hist-paper.pdf J. Bikker. (2014). Graphics: Universiteit Utrecht - Information and Computing Sciences. [Online]. Available: http://www.cs.uu.nl/docs/vakken/gr/2015/index.html

Sources (continued) K. Rupp. (2014, June 21). CPU, GPU and MIC Hardware Characteristics over Time, [Online]. Available: http://www.karlrupp.net/2013/06/cpugpu-and-mic-hardware-characteristics-over-time/ J. Lawrence. (2012, Oct. 22). 3D Polygon Rendering Pipeline. [Online]. Available: http://www.cs.virginia.edu/~gfx/courses/2012/intrographics/lectures/13-pipeline.pdf M. Christen. (2003). Tutorial 4: Varying Variables. [Online]. Available: http://www.clockworkcoders.com/oglsl/tutorial4.htm T. S. Crow, Evolution of the Graphical Processing Unit, M. S. thesis, Dept. Comp. Science, University of Nevada, Reno, NV, 2004. Utah teapot, Wikipedia. [Online]. Available: http://en.wikipedia.org/wiki/utah_teapot A. Rege. (2008). An Introduction to Modern GPU Architecture. [Online]. Available: http://http.download.nvidia.com/developer/cuda/seminar/tdci_arch.pdf HD2000 - The First GPU s under the AMD Name. (2007, May 14) [Online]. Available: http://www.bjorn3d.com/2007/05/hd2000-the-first-gpusunder-the-amd-name-2/ E. Kelgariff and R. Fernando. (2005). Chapter 30. The GeForce 6 Series GPU Architecture, GPU Gems 2. [Online]. Available: http://http.developer.nvidia.com/gpugems2/gpugems2_chapter30.html P. N. Glaskowsky. (2009, Sept.). NVIDIA s Fermi: The First Complete GPU Computing Architecture. [Online]. Available: http://www.nvidia.com/content/pdf/fermi_white_papers/p.glaskowsky_nvidia%27s_fermi-the_first_complete_gpu_architecture.pdf