A Trip Down The (2011) Rasterization Pipeline

Similar documents
Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Approximate Catmull-Clark Patches. Scott Schaefer Charles Loop

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Real-Time Hair Rendering on the GPU NVIDIA

Spring 2009 Prof. Hyesoon Kim

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

The Application Stage. The Game Loop, Resource Management and Renderer Design

Motivation MGB Agenda. Compression. Scalability. Scalability. Motivation. Tessellation Basics. DX11 Tessellation Pipeline

CS427 Multicore Architecture and Parallel Computing

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Spring 2011 Prof. Hyesoon Kim

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University

Beyond Programmable Shading Course, ACM SIGGRAPH 2011

Why modern versions of OpenGL should be used Some useful API commands and extensions

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

Beyond Programmable Shading 2012

OIT to Volumetric Shadow Mapping, 101 Uses for Raster Ordered Views using DirectX 12

Fast Tessellated Rendering on Fermi GF100. HPG 2010 Tim Purcell

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

High Quality Direct3D 10.0 & 10.1 Accelerated Techniques. Jon Story, AMD Holger Gruen, AMD

Rendering Grass with Instancing in DirectX* 10

Direct Rendering of Trimmed NURBS Surfaces

Introduction to the Direct3D 11 Graphics Pipeline

CS451Real-time Rendering Pipeline

Shaders (some slides taken from David M. course)

8/5/2012. Introduction. Transparency. Anti-Aliasing. Applications. Conclusions. Introduction

Evolution of GPUs Chris Seitz

Beyond Programmable Shading Keeping Many Cores Busy: Scheduling the Graphics Pipeline

2.11 Particle Systems

Rendering Subdivision Surfaces Efficiently on the GPU

Parallel Programming for Graphics

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Deus Ex is in the Details

High-Quality Surface Splatting on Today s GPUs

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Lecture 13: Reyes Architecture and Implementation. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Graphics Processing Unit Architecture (GPU Arch)

Shaders. Slide credit to Prof. Zwicker

CS770/870 Fall 2015 Advanced GLSL

Morphological: Sub-pixel Morhpological Anti-Aliasing [Jimenez 11] Fast AproXimatte Anti Aliasing [Lottes 09]

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Could you make the XNA functions yourself?

Constant-Memory Order-Independent Transparency Techniques

NVIDIA Parallel Nsight. Jeff Kiel

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

DirectX10 Effects. Sarah Tariq

GPU Memory Model. Adapted from:

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Approximating Subdivision Surfaces with Gregory Patches for Hardware Tessellation

DirectX10 Effects and Performance. Bryan Dudash

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

MSAA- Based Coarse Shading

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Cg 2.0. Mark Kilgard

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

GCN Performance Tweets AMD Developer Relations

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

Com S 336 Final Project Ideas

INF3320 Computer Graphics and Discrete Geometry

OpenGL Status - November 2013 G-Truc Creation

Game Graphics Programmers

GeForce4. John Montrym Henry Moreton

Ciril Bohak. - INTRODUCTION TO WEBGL

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Computer Graphics 10 - Shadows

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Resolve your Resolves Jon Story Holger Gruen AMD Graphics Products Group

Programming Graphics Hardware

Rasterization Overview

Graphics and Imaging Architectures

LOD and Occlusion Christian Miller CS Fall 2011

Efficient GPU Rendering of Subdivision Surfaces. Tim Foley,

Rendering Objects. Need to transform all geometry then

Gettin Procedural. Jeremy Shopf 3D Application Research Group

1.2.3 The Graphics Hardware Pipeline

Filtering theory: Battling Aliasing with Antialiasing. Department of Computer Engineering Chalmers University of Technology

Dave Shreiner, ARM March 2009

The Rasterization Pipeline

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

New GPU Features of NVIDIA s Maxwell Architecture

Beyond Programmable Shading Course ACM SIGGRAPH 2010 Bending the Graphics Pipeline

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Hardware Displacement Mapping

Introduction to OpenGL

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

The Graphics Pipeline

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

Real-Time Reyes Programmable Pipelines and Research Challenges

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Transcription:

A Trip Down The (2011) Rasterization Pipeline Aaron Lefohn - Intel / University of Washington Mike Houston AMD / Stanford 1

This talk Overview of the real-time rendering pipeline available in ~2011 corresponding to graphics APIs: DirectX 11 OpenGL 4.x Discuss What changes from DX9 to DX11 Key uses of these new features 2

General Rasterization Pipeline Graphics Processing (Unit) Application Geometry Processing Rasterization Pixel Processing Geometry processing: Transforms geometry, generates more geometry, computes per-vertex attributes Rasterization: Sets up a primitive (e.g., triangle), and finds all samples inside the primitive Pixel processing Interpolates vertex attributes, and computes pixel color Slide by Tomas Akenine-Möller 3

DX10 4

Remember DX9? Input Assembler Vertex Shader Index buffer Memory Vertex Buffer Rasterization Pixel Shader Output Merger Texture Depth/Stencil Render Target Slide by Tomas Akenine-Möller 5

DX10 Input Assembler Vertex Shader Index buffer Texture Memory Vertex Buffer Geometry Shader Rasterization Texture Stream Output Pixel Shader Output Merger Texture Depth/Stencil Render Target Slide by Tomas Akenine-Möller 6

DX10 Increased Shading Capability In addition to adding a new pipeline stage, DX10 greatly increased programmability of vertex and fragment stages All shader stages have same limitations (unified shading cores) Pixel/fragment shaders grow up Instruction limits, register limits, flow control, etc. Well-specified floating point precision requirements Write up to 8, fp32x4 outputs per fragment 7

Geometry Shader Geometry Shader Input Vertices of a full primitive (1 for point, 2 for line, 3 for triangle) Vertices of edge-adjacent primitives Texture Primitive ID - allows to fetch or compute perface data Output Point-, line- or triangle strip Sent to rasterizer or vertex buffer in memory Variable output size makes it hard to parallelize Memory Stream Output Slide by Tomas Akenine-Möller 8

Geometry Shader Usage examples Point Sprite Tessellation: The shader takes in a single vertex and generates four vertexes (two output triangles) that represent the four corners of a quad Geometry Shader Wide Line Tessellation: The shader receives two line vertexes and generates four vertexes for a quad that represents a widened line. Other uses: generation of shadow volumes and cube maps Geometry Shader Slide by Tomas Akenine-Möller 9

Point Sprites and Wide Lines Scene courtesy of Valve Corporation Smoke represented by point sprites Points converted to quads in geometry shader Hair represented by wide lines Lines converted to quads in geometry shader Images from Adaptive Volumetric Shadow Maps, EGSR 2010, Salvi et al. 10

Render-To-Volume Geometry Shader Slide by Sam Glassenberg 11

Single Pass Render-To- Cubemap Geometry Shader v u y z x Slide by Sam Glassenberg 12

Direct3D 10.1 Features Full anti-aliasing control Application control over: Multi-sample AA (smooth edges) or Super-sample AA (smooth edges and interior) Selecting sample patterns Pixel coverage mask High-quality vegetation, motion blur, particles Minimum of 4 samples/pixel required Slide by Sam Glassenberg 13

DX11 14

Remember DX10? Input Assembler Vertex Shader Index buffer Texture Memory Vertex Buffer Geometry Shader Rasterization Texture Stream Output Pixel Shader Output Merger Texture Depth/Stencil Render Target Slide by Tomas Akenine-Möller 15

DX11 Input Assembler Vertex Shader Index buffer Texture Memory Vertex Buffer Hull Shader Patch info Tessellation Domain Shader Texture Geometry Shader Rasterization Texture Stream Output Pixel Shader Arbitrary memory region Output Merger Depth/Stencil Render Target Slide by Tomas Akenine-Möller 16

DX11 Tessellation example: Displaced subdivision surfaces Slide by Tomas Akenine-Möller 17

DX11 - new stages for tessellation Input Assembler Vertex Shader Hull Shader Tessellator New Programmable stages Hull Shader Domain Shader Fixed Function Stage Tessellator Domain Shader Geometry Shader Rasterization Pixel Shader Output Merger Slide by Tomas Akenine-Möller 18

Tessellation - Input Assembler Input Assembler New Patch primitive type Vertex Shader Hull Shader Outputs vertices to vertex shader and patch control points to hull shader Tessellator Domain Shader Geometry Shader Rasterization Pixel Shader Output Merger Slide by Tomas Akenine-Möller 19

Tessellation - Vertex Shader Input Assembler Vertex Shader Hull Shader One invocation per control point For example, skin/animate the control points Tessellator Domain Shader Geometry Shader Rasterization Pixel Shader Output Merger Slide by Tomas Akenine-Möller 20

Hull Shader Input Assembler HS works on an entire patch Vertex Shader Hull Shader HS can access all the input and output control points Tessellator Domain Shader Geometry Shader Typical functions Assigns edge LODs Change the basis Rasterization Pixel Shader Output Merger Slide by Tomas Akenine-Möller 21

Tessellator Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterization TS inputs are edge LODs and additional knob values Inner tessellation + tess mode TS generates (u,v) coordinates and connectivity information The (u,v) coordinates are in the domain [0,1] Calculated using fixed point math to ensure water tight edges Pixel Shader Output Merger Slide by Tomas Akenine-Möller 22

Domain Shader Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterization Pixel Shader One domain shader invocation per (u,v) pair DS gets (u,v) from tessellator and control points from HS Computes a real 3D point from a domain location (u,v) For example, displace the point using displacement map Project the point Calculate auxiliary per vertex data Texture coordinates Tangent space vectors Output Merger Slide by Tomas Akenine-Möller 23

An example through the tessellation pipeline Slide by Tomas Akenine-Möller 24

DX11 Tessellation Examples: Codemasters DiRT2 Slides from DiRT2 DirectX 11 Technology, Thomas and Story, GDC 2010 25

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 26

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 27

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 28

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 29

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 30

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 31

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 32

PN-Triangles Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 33

PN-Triangles Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 34

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 35

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 36

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 37

PN-Triangles Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 38

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 39

Original Mesh Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 40

PN-Triangles Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 41

PN-Triangles + Displacements Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 42

Displacement Mapped Surface Slide from GDC 2010 Presentation, DiRT2 DirectX11 Technology, Tomas Thomas, Codemasters Jon Story, AMD 43

Displacement Mapping: OFF 44

Displacement Mapping: ON 45

DX11 Pixel Shader 46

DirectX11 Pixel Shader Changes Read and write anywhere in memory Unordered access views (UAVs) Pixel shaders can now write somewhere other than pixel position (!) Atomic read-modify-write operations add, subtract, min, max, compare-exchange,, or, xor 47

DX11 Pixel Shader Implications Pixel shaders can build data structures other than images Scatter/gather and atomic operations Render to user-defined data structure Limitations Atomic RMW operations only on 32-bit integers (no atomic struct RMW) No critical sections / mutexes 48

Real-time A-Buffer Generation Pixel Shader 5 Example 49

Order Independent Transparency I Real-Time Concurrent Linked List Construction on the GPU, Yang et al., EGSR 2010 50

The Problem with Transparency Real-Time Concurrent Linked List Construction on the GPU, Yang et al., EGSR 2010 Skeleton hidden No Sorting With Sorting Sorting is hard! Arm appears in front of body 51

Solution: Sort Fragments Per Pixel Create A-Buffer using DX11 rendering pipeline The A -buffer, an antialiased hidden surface method, Carpenter, SIGGRAPH 1984 Render pass 1 Store linked list of fragments per-pixel in 2 separate buffers UAV 1: Storage for all [RGBA,Z,next] values from all fragments UAV 2: Image-sized storage for a head pointer per pixel position Render pass 2 Full-screen pass that sorts and blends fragments to create pixel color 52

DX11 A-Buffer Pros Correct solution to order-independent transparency Runs at interactive rates on current hardware Cons A-buffers use unbounded amount of memory Performance variable depending on order fragments stored in memory 53

DX11 ComputeShader / DirectCompute Out-of-pipeline new programming model for general computation, designed to interact tightly with the graphics API (language is HLSL) Similar in semantics and capability to OpenCL and CUDA We will discuss DirectCompute later in the course 54

Summary Jump from DX9 to DX11 (OGL2 to OGL4) Adds 4 new pipeline stages (hull, tessellator, domain, geometry) Makes shader stages far more general compute engines Enables users to render to user-defined data structures (and defer visibility determination) Greatly relaxes memory model of shaders (especially scatter/atomics) Adds DirectCompute, which begins to blur lines between pipeline and userdefined parallel programs Is still new wrt adoption in the game world (but a number of DX11 titles shipping) Get to know DX11/GL4 well---you can do *a lot* within the confines of the rendering pipeline 55

Backup 56

Geometry Shader Example Shadow volume generation Slide by Sam Glassenberg 57

Geometry Shader Example Generalized displacement maps Displacement Mapping (Direct3D 10) Slide by Sam Glassenberg 58

59

New DX10 and DX10.1 features heavily used for shadow techniques in production games Examples of some from High Quality Direct3D 10.0 & 10.1 Accelerated Techniques, Story and Gruen, GDC 2009 60

HDAO On Off Buffer (depth (depth only) only) 40 Gathers No filtering needed HDAO Code Sample 61

HDAO On Buffer (depth (depth & only) normals) & normals) - 80 Gathers (could use alot less) - No filtering needed 62

Tom Clancy s HAWX Publisher: Ubisoft Developer: Ubisoft Romania 63

Stormrise Publisher: SEGA Developer: The Creative Assembly Australia 64

BattleForge Publisher: EA Developer: EA Phenomic 65

From DICE s Frostbite Engine: Uniform Shadow Filtering 66

From DICE s Frostbite Engine: Unique Weight Shadow Filtering 67

From DICE s Frostbite Engine: Standard 2x2 Shadow Filtering 68

From DICE s Frostbite Engine: 5x5 Unique Weight Filtering 69

Tom Clancy s HAWX Publisher: Ubisoft Developer: Ubisoft Romania Blurred VSM 70

Tom Clancy s HAWX Publisher: Ubisoft Developer: Ubisoft Romania Unique Weight Shadows 71