GPU Memory Model. Adapted from:

Similar documents
General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

GPU Memory Model Overview

The Application Stage. The Game Loop, Resource Management and Renderer Design

Rasterization Overview

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

1.2.3 The Graphics Hardware Pipeline

Ciril Bohak. - INTRODUCTION TO WEBGL

The GPGPU Programming Model

Lecture 2. Shaders, GLSL and GPGPU

Shaders. Slide credit to Prof. Zwicker

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Rendering Objects. Need to transform all geometry then

GeForce4. John Montrym Henry Moreton

Programmable Graphics Hardware

CS770/870 Fall 2015 Advanced GLSL

Spring 2009 Prof. Hyesoon Kim

2.11 Particle Systems

Spring 2011 Prof. Hyesoon Kim

Threading Hardware in G80

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Lecture 25: Board Notes: Threads and GPUs

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Graphics Processing Unit Architecture (GPU Arch)

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

The Graphics Pipeline

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Portland State University ECE 588/688. Graphics Processors

E.Order of Operations

Optimisation. CS7GV3 Real-time Rendering

CS451Real-time Rendering Pipeline

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Could you make the XNA functions yourself?

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

The Rasterization Pipeline

Dave Shreiner, ARM March 2009

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Working with Metal Overview

Graphics Hardware. Instructor Stephen J. Guy

Introduction to CUDA (1 of n*)

CS770/870 Spring 2017 Open GL Shader Language GLSL

CS770/870 Spring 2017 Open GL Shader Language GLSL

Module 13C: Using The 3D Graphics APIs OpenGL ES

CS 4620 Program 3: Pipeline

Direct Rendering of Trimmed NURBS Surfaces

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Mattan Erez. The University of Texas at Austin

A Trip Down The (2011) Rasterization Pipeline

Programming Guide. Aaftab Munshi Dan Ginsburg Dave Shreiner. TT r^addison-wesley

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Models and Architectures

OpenGL Programmable Shaders

EECS 487: Interactive Computer Graphics

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

What s New with GPGPU?

Introduction to Shaders.

Introduction to Computer Graphics with WebGL

Shaders (some slides taken from David M. course)

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Scanline Rendering 2 1/42

SHADER PROGRAMMING. Based on Jian Huang s lecture on Shader Programming

Parallel Programming on Larrabee. Tim Foley Intel Corp

Lecture 13: OpenGL Shading Language (GLSL)

OpenGL SUPERBIBLE. Fifth Edition. Comprehensive Tutorial and Reference. Richard S. Wright, Jr. Nicholas Haemel Graham Sellers Benjamin Lipchak

COMP371 COMPUTER GRAPHICS

Chapter 1 Introduction

Pipeline Operations. CS 4620 Lecture 14

Blis: Better Language for Image Stuff Project Proposal Programming Languages and Translators, Spring 2017

OPENGL RENDERING PIPELINE

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

CS427 Multicore Architecture and Parallel Computing

Programming shaders & GPUs Christian Miller CS Fall 2011

Shader Programming 1. Examples. Vertex displacement mapping. Daniel Wesslén 1. Post-processing, animated procedural textures

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

CHAPTER 1 Graphics Systems and Models 3

CS4620/5620: Lecture 14 Pipeline

Evolution of GPUs Chris Seitz

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

GPGPU. Peter Laurens 1st-year PhD Student, NSC

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

CS 381 Computer Graphics, Fall 2008 Midterm Exam Solutions. The Midterm Exam was given in class on Thursday, October 23, 2008.

CS452/552; EE465/505. Image Processing Frame Buffer Objects

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Sung-Eui Yoon ( 윤성의 )

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer

GeForce3 OpenGL Performance. John Spitzer

Graphics Pipeline & APIs

Mattan Erez. The University of Texas at Austin

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Transcription:

GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University of Pennsylvania

Review 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command & Data Stream GPU GPU Front Front End End 3D API Commands Vertex Index Stream 3D 3D Application Application Or Or Game Game Primitive Primitive Assembly Assembly Assembled Primitives Rasterization and Interpolation Fixed-function pipeline CPU-GPU Boundary (AGP/PCIe) Pixel Location Stream Raster Operations Pixel Updates Frame Frame Buffer Buffer Pre-transformed Vertices Programmable Programmable Vertex Vertex Processor Processor Transformed Vertices Pre-transformed Fragments Programmable Programmable Fragment Fragment Processor Processor Transformed Fragments

Overview GPU Memory Model GPU Data Structure Basics Introduction to Framebuffer Objects Fragment Pipeline Vertex Pipeline

Memory Hierarchy CPU and GPU Memory Hierarchy Disk CPU Main Memory CPU Caches GPU Video Memory CPU Registers GPU Caches GPU Constant Registers GPU Temporary Registers 4

CPU Memory Model At any program point Allocate/free local or global memory Random memory access Registers Read/write Local memory Read/write to stack Global memory Disk Read/write to heap Read/write to disk 5

GPU Memory Model Much more restricted memory access Allocate/free memory only before computation Limited memory access during computation (kernel) Registers Read/write Local memory Does not exist Global memory Read-only during computation Write-only at end of computation (pre-computed address) Disk access Does not exist 6

GPU Memory Model Where is GPU Data Stored? Vertex buffer Frame buffer Texture VS 3.0 GPUs Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Frame Buffer(s)

Vertex Buffers GPU memory for vertex data Vertex data required to initiate render pass VS 3.0 GPUs Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Frame Buffer(s)

Vertex Buffers Supported Operations CPU interface Allocate Free Copy CPU GPU Copy GPU GPU (Render-to-vertex-array) Bind for read-only vertex stream access GPU interface Stream read (vertex program only) 9

Vertex Buffers Limitations CPU No copy GPU CPU No bind for read-only random access GPU No random-access reads No access from fragment programs 10

Textures Random-access GPU memory VS 3.0 GPUs Texture Vertex Buffer Vertex Processor Rasterizer Fragment Processor Frame Buffer(s)

Textures Supported Operations CPU interface Allocate Free Copy CPU GPU Copy GPU CPU Copy GPU GPU (Render-to-texture) Bind for read-only random access (vertex or fragment) GPU interface Random read 12

Textures Limitations No bind for vertex stream access 13

Next GPU Data Structure Basics 14

GPU Arrays Large 1D Arrays Current GPUs limit 1D array sizes Pack into 2D memory 1D-to-2D address translation 15

GPU Arrays 3D Arrays Problem GPUs do not have 3D frame buffers Solutions 1. Stack of 2D slices 2. Multiple slices per 2D buffer 16

GPU Arrays Problems With 3D Arrays Cannot read stack of 2D slices as 3D texture Must know which slices are needed in advance Solutions Flat 3D textures Need packing functions for coordinate transfer 17

GPU Arrays Higher Dimensional Arrays Pack into 2D buffers N-D to 2D address translation Same problems as 3D arrays if data does not fit in a single 2D texture 18

Sparse/Adaptive Data Structures Why? Reduce memory pressure Reduce computational workload Examples Sparse matrices Krueger et al., Siggraph 2003 Bolz et al., Siggraph 2003 Premoze et al. Eurographics 2003 Deformable implicit surfaces (sparse volumes/pdes) Lefohn et al., IEEE Visualization 2003 / TVCG 2004 Adaptive radiosity solution (Coombe et al.) 19

Sparse/Adaptive Data Structures Basic Idea Pack active data elements into GPU memory 20

GPU Data Structures Conclusions Fundamental GPU memory primitive is a fixed-size 2D array GPGPU needs more general memory model Building and modifying complex GPU-based data structures is an open research topic 21

Framebuffer Objects What is an FBO? A struct that holds pointers to memory objects Each bound memory object can be a framebuffer rendering surface Platform-independent Texture (RGBA) Framebuffer Object Renderbuffer (Depth) 22

Framebuffer Objects Which memory can be bound to an FBO? Textures Renderbuffers Depth, stencil, color Traditional write-only framebuffer 23

The fragment pipeline Input: Fragment Attributes Input: Texture Image Color R G B A Position X Y Z W Texture coordinates Texture coordinates 32 bits = float 16 bits = half X Y [Z] - X Y [Z] - Interpolated from vertex information X Y Z W Each element of texture is 4D vector Textures can be square or rectangular (power-of-two or not) 24

The fragment pipeline Input: Uniform parameters Can be passed to a fragment program like normal parameters set in advance before the fragment program executes Example: A counter that tracks which pass the algorithm is in. Input: Constant parameters Fixed inside program E.g. float4 v = (1.0, 1.0, 1.0, 1.0) Examples: 3.14159.. Size of compute window 25

The fragment pipeline Math ops: USE THEM! cos(x)/log2(x)/pow(x,y) dot(a,b) mul(v, M) sqrt(x) cross(u, v) Using built-in ops is more efficient than writing your own Swizzling/masking: an easy way to move data around. v1 = (4,-2,5,3); // Initialize v2 = v1.yx; // v2 = (-2,4) s = v1.w; // s = 3 v3 = s.rrr; // v3 = (3,3,3) Write masking: v4 = (1,5,3,2); v4.ar = v2; // v4=(4,5,4,-2) 26

The fragment pipeline float4 v = tex2d(img, float2(x,y)) y Texture access is like an array lookup. The value in v can be used x to perform another lookup! This is called a dependent read Texture reads (and dependent reads) are expensive resources, and are limited in different GPUs. Use them wisely! 27

The fragment pipeline Control flow: (<test>)?a:b operator. if-then-else conditional [nv3x] Both branches are executed, and the condition code is used to decide which value is used to write the output register. [nv40] True conditionals for-loops and do-while [nv3x] limited to what can be unrolled (i.e no variable loop limits) [nv40] True looping. 28

The fragment pipeline Fragment programs use call-by-result Notes: Only output color can be modified Textures cannot be written out float4 result : COLOR // Do computation result = <final answer> Setting different values in different channels of result can be useful for debugging 29

The Vertex Pipeline Input: vertices position, color, texture coords. Input: uniform and constant parameters. Matrices can be passed to a vertex program. Lighting/material parameters can also be passed. 30

The Vertex Pipeline Operations: Math/swizzle ops Matrix operators Flow control (as before) [nv3x] No access to textures. Output: Modified vertices (position, color) Vertex data transmitted to primitive assembly. 31

Vertex programs are useful We can replace the entire geometry transformation portion of the fixed-function pipeline. Vertex programs used to change vertex coordinates (move objects around) There are many fewer vertices than fragments: shifting operations to vertex programs improves overall pipeline performance. Much of shader processing happens at vertex level. We have access to original scene geometry. 32

Vertex programs are not useful Fragment programs allow us to exploit full parallelism of GPU pipeline ( a processor at every pixel ). Rule of thumb: If computation requires intensive calculation, it should probably be in the fragment processor. If it requires more geometric/graphic computing, it should be in the vertex processor. 33

Conclusions GPU Memory Model Evolving Writable GPU memory forms loop-back in an otherwise feed-forward pipeline Memory model will continue to evolve as GPUs become more general data-parallel processors Data Structures Basic memory primitive is limited-size, 2D texture Use address translation to fit all array dimensions into 2D 34