Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Similar documents
X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CS427 Multicore Architecture and Parallel Computing

Graphics Hardware. Instructor Stephen J. Guy

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

GPGPU. Peter Laurens 1st-year PhD Student, NSC

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

CS GPU and GPGPU Programming Lecture 3: GPU Architecture 2. Markus Hadwiger, KAUST

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Introduction to Shaders.

Could you make the XNA functions yourself?

Lecture 2. Shaders, GLSL and GPGPU

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Graphics Processing Unit Architecture (GPU Arch)

Shaders. Slide credit to Prof. Zwicker

Programming Graphics Hardware

GPU Architecture and Function. Michael Foster and Ian Frasch

Antonio R. Miele Marco D. Santambrogio

PowerVR Hardware. Architecture Overview for Developers

CS195V Week 9. GPU Architecture and Other Shading Languages

Shaders (some slides taken from David M. course)

Hardware Accelerated Volume Visualization. Leonid I. Dimitrov & Milos Sramek GMI Austrian Academy of Sciences

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Spring 2009 Prof. Hyesoon Kim

Cg 2.0. Mark Kilgard

Current Trends in Computer Graphics Hardware

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Portland State University ECE 588/688. Graphics Processors

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Spring 2011 Prof. Hyesoon Kim

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Efficient and Scalable Shading for Many Lights

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Programming shaders & GPUs Christian Miller CS Fall 2011

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

A Trip Down The (2011) Rasterization Pipeline

The Application Stage. The Game Loop, Resource Management and Renderer Design

! Readings! ! Room-level, on-chip! vs.!

Real-time Graphics 9. GPGPU

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

CS 179: GPU Programming

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

GPUs and GPGPUs. Greg Blanton John T. Lubia

2.11 Particle Systems

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Rendering Objects. Need to transform all geometry then

GPU Memory Model. Adapted from:

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

ECE 574 Cluster Computing Lecture 16

Evolution of GPUs Chris Seitz

Programmable Graphics Hardware

ECE 571 Advanced Microprocessor-Based Design Lecture 20

Real-time Graphics 9. GPGPU

Introduction to Modern GPU Hardware

1.2.3 The Graphics Hardware Pipeline

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Threading Hardware in G80

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

OpenGL Programmable Shaders

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Sung-Eui Yoon ( 윤성의 )

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

Graphics and Imaging Architectures

NVIDIA Parallel Nsight. Jeff Kiel

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Programmable Graphics Hardware

Introduction to the Direct3D 11 Graphics Pipeline

Mattan Erez. The University of Texas at Austin

The Graphics Pipeline

ECE 571 Advanced Microprocessor-Based Design Lecture 18

The Problem: Difficult To Use. Motivation: The Potential of GPGPU CGC & FXC. GPGPU Languages

SHADER PROGRAMMING. Based on Jian Huang s lecture on Shader Programming

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

PowerVR Series5. Architecture Guide for Developers

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

CS4621/5621 Fall Computer Graphics Practicum Intro to OpenGL/GLSL

Rendering Subdivision Surfaces Efficiently on the GPU

GPU A rchitectures Architectures Patrick Neill May

What s New with GPGPU?

COMP371 COMPUTER GRAPHICS

GeForce3 OpenGL Performance. John Spitzer

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Beyond Programmable Shading. Scheduling the Graphics Pipeline

frame buffer depth buffer stencil buffer

The Need for Programmability

Transcription:

Why GPU? Chapter 1

Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high performance H/W accelerated graphics operation

Graphics Hardware CPUs vs. GPUs CPUs Optimized for high performance on sequential code Model for threading coarse, heavyweight GPUs Optimized for highly data-parallel nature of graphics computation Model for threading fine, extremely lightweight

Computational Power of GPU GPUs are getting faster CPUs Annual growth: 1.5x Decade growth: 60x GPUs Annual growth: >2.0x Decade growth: > 1000x

Why GPU is Trendy A massively parallel architecture Modern GPUs are deeply programmable Programmable pixel, vertex, and geometry engines Modern GPUs support real precision 32-bit floating point throughout the pipeline

Why GPU is Trendy Dedicated instructions for graphics tasks Useful operations for graphics vectors, matrices, textures Extremely fast filtering Linear and some anisotropic interpolation is implemented in wired logic

Limitations: H/W Restrictions Restriction of on-board memory size Up to 4GB, usually <1GB Insufficient support for flexible memory manipulation Programmability still restricted in a number of ways Limited branch divergence, such as loops or conditional clauses

Limitations: Difficult to Use GPUs designed for and driven by video games Underlying architectures are: Inherently data parallel Rapidly evolving (even in basic feature set!) Largely secret Can t simply port CPU code Good News: it s getting better (GPGPU)

Limitations: Matter of Choice H/W side: vendor wars Semantically same functionality is implemented by different methods in internal architecture Vendors are reluctant to open internal architecture API side: too obsolete, too fluctuating, too complex SGI OpenGL: standardization process is too slow Microsoft Direct3D: fast adaptation of new technologies GPGPU languages

Summary GPU is a massively parallel architecture Many problems map well to GPU-style computing GPUs have large amount of arithmetic capability Increasing amount of programmability in the pipeline Challenge: How do we make the best use of GPU hardware? Think in parallel

Traditional Graphics Pipeline

Lighting and Rasterization Most time-consuming part on early CG era H/Ws to accelerate pixel processing are introduced 3Dfx Voodoo (1996) no VGA 3DLabs Permedia (1996) H/W support for OpenGL API NVIDIA Riva (1997) Vertex processed on CPU

Transformation & Lighting Hardware-accelerated vertex processing Vertex data stored in graphics memory Microsoft Direct3D 7 (1999) NVIDIA GeForce256 (1999)

Programmable Shader Fixed-pipeline acceleration H/W is inflexible Fixed vertex transformation with WVP matrix Fixed shading algorithms, filtering methods, Demand for high quality rendering exploded! Complicated texture mapping and filtering methods Various light sources and finer shading methods

Programmable Shader Shaders: greater flexibility Vertex shaders allow the manipulation of vertex data Pixel shaders allow the manipulation of pixel data

Programmable Graphics Pipeline (early age) Application Scene management: vertices, Vertex operations Transform and lighting Culling, clipping Pixel operations Triangle setup and rasterization Shading, multi-texturing Alpha test, depth buffering,. Display Vertex shader Pixel shader A vertex shader operates on one vertex at a time A vertex shader cannot add vertices A pixel shader operates on one pixel at a time A pixel shader cannot add pixels

Shader Assembly The first programmable shader model on PC Microsoft Direc3D 8.1 (2000) NVIDIA GeForce 3 and ATI Radeon 8500 (2001) Mnemonic instructions correspond with machine instructions for programmable-shader H/W Too difficult to develop! H/W-dependent

NVIDIA Cg C language for graphics Similar syntax to C with many restrictions and exceptions Integrated with NVIDIA Cg SDK Supports various targets GeForce series or DirectX versions OpenGL extension

NVIDIA Cg Example code: Phong Shading void main( position: TEXCOORD0, : per each fragement normal: TEXCOORD1, ocolor: COLOR, ambientcol, lightcol, lightpos, eyepos, Ka, Kd, Ks, shiny) { P = position.xyz;

Microsoft HLSL Microsoft adopts NVIDIA Cg into Direct3D API Microsoft Direct3D 9 (2002) HLSL (High Level Shading Language) 2.0 and Shader Model 2.0 Become de facto standard of shader language on PC graphics hardware ATI Radeon 9500 (2002) NVIDIA GeForceFX (2003)

Restriction of Shader Model 2.0 Severe limitations on resources 256 instructions per program 16 temporary 4-vector registers 256 uniform parameter registers 2 address registers (4-vector) 6 clip-distance outputs 16 per-vertex attributes (only) Texture sampling in pixel shader only No dynamic flow control Loops are unrolled Conditional skippings do not save time

Shader Model 3.0 Microsoft Direct3D 9.0c (2004) NVIDIA GeForce 6 series (2004) ATI X1x00 series (2005) HLSL 3.0 introduced Grammar is almost identical to HLSL 2.0

Shader Model 3.0 Many restrictions are relieved Several limitations still exist

Shader Model 3.0 Dynamic branching Highly computational functions on some areas

Shader Model 3.0 Primitive instancing Render large number of objects with one vertex set and per-instance information

Shader Load Balancing Asymmetry on shader computation ability Vertex processing has been overwhelmed by pixel processing in traditional graphics application # of vertices rapidly increased Detailed modeling of objects # of objects exploded (e.g. MMORPG) Vertex processing is not lightweight anymore

Load balancing problem Load Balancing

Unified Shader Why use separate cores for VS and PS? Since SM 3.0, classification of VS and PS has become meaningless Vertex shader samples textures Dynamic branches

Unified Shader

Unified Shader ATI Xenos in Microsoft Xbox 360 (2005)

Shader Model 4.0 Microsoft Direct3D 10 (2006) NVIDIA GeForce 8 series (2006) ATI Radeon 2900 Unified Shader High flexibility Many limitations are removed or relaxed # of instructions, constants, variables, Flexible branching Loop, conditional branching,

New pipeline Shader Model 4.0

Shader Model 4.0

Direct3D 11 Shader model 5.0 with HLSL 5.0 Tesselation stages Two programmable shader Hull shader, domain shader Compute shader (DirectCompute) New programmable stage for GPGPU support

General-Purpose GPU GPU is a processor Extreme performance with high parallelism For special purposes But, modern GPUs are no more designed for special purposes Flexible programming Sufficient bidirectional memory bandwidth

General-Purpose GPU GPU is cheaper than other parallel units Huge market Computer game Cinema industry Why not use GPU for general-purpose heavy computations?

GPGPU Ideal application High arithmetic intensity Large data sets Lots of work to do w/o CPU intervention High parallelism Minimal dependencies between elements

GPGPU Early Days Programming by exploiting some GPU functionalities Stream/array textures Parallel loops drawing quads Memory read texture fetch Memory write frame buffer output Classification of data depth test Value accumulation alpha blending

GPGPU Early Days GPU wrapper for GP programming Such exploitations as an API set BrookGPU Developed at Stanford University C-like language with streaming extensions Compiles GPGPU-coded kernel to D3D/OpenGL shading models

GPGPU H/W vendors noticed API CTM (2006) NVIDIA CUDA (2007) dominant in market Standardization OpenCL (2008)

Q/A