Introduction to the Direct3D 11 Graphics Pipeline

Similar documents
Motivation MGB Agenda. Compression. Scalability. Scalability. Motivation. Tessellation Basics. DX11 Tessellation Pipeline

Approximating Subdivision Surfaces with Gregory Patches for Hardware Tessellation

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Approximate Catmull-Clark Patches. Scott Schaefer Charles Loop

Lecture 4: Geometry Processing. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

A Trip Down The (2011) Rasterization Pipeline

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation

Direct3D 11 Performance Tips & Tricks

Working with Metal Overview

The Application Stage. The Game Loop, Resource Management and Renderer Design

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Efficient GPU Rendering of Subdivision Surfaces. Tim Foley,

Direct Rendering of Trimmed NURBS Surfaces

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Rendering Subdivision Surfaces Efficiently on the GPU

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Rendering Grass with Instancing in DirectX* 10

Lecture 2. Shaders, GLSL and GPGPU

D3D11 Tessellation. Sarah Tariq NVIDIA

DirectX10 Effects. Sarah Tariq

A Real-time Micropolygon Rendering Pipeline. Kayvon Fatahalian Stanford University

DirectX10 Effects and Performance. Bryan Dudash

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Why modern versions of OpenGL should be used Some useful API commands and extensions

CS GAME PROGRAMMING Question bank

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

CS GPU and GPGPU Programming Lecture 7: Shading and Compute APIs 1. Markus Hadwiger, KAUST

CS427 Multicore Architecture and Parallel Computing

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Graphics Hardware. Instructor Stephen J. Guy

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Ray Casting of Trimmed NURBS Surfaces on the GPU

PN-triangle tessellation using Geometry shaders

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

! Readings! ! Room-level, on-chip! vs.!

Next-Generation Rendering of Subdivision Surfaces. Developer Technology - NVIDIA

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

GCN Performance Tweets AMD Developer Relations

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Real-Time Hair Rendering on the GPU NVIDIA

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

GeForce4. John Montrym Henry Moreton

Course Recap + 3D Graphics on Mobile GPUs

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

Lecture 25: Board Notes: Threads and GPUs

NVIDIA Parallel Nsight. Jeff Kiel

Could you make the XNA functions yourself?

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

Deus Ex is in the Details

Portland State University ECE 588/688. Graphics Processors

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

INF3320 Computer Graphics and Discrete Geometry

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Programming Graphics Hardware

Canonical Shaders for Optimal Performance. Sébastien Dominé Manager of Developer Technology Tools

Pump Up Your Pipeline

HW-Tessellation Basics

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Introduction to Shaders.

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Spring 2011 Prof. Hyesoon Kim

Shaders (some slides taken from David M. course)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

I ll be speaking today about BC6H compression, how it works, how to compress it in realtime and what s the motivation behind doing that.

Graphics Processing Unit Architecture (GPU Arch)

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

2.11 Particle Systems

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

The Graphics Pipeline

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

PowerVR Hardware. Architecture Overview for Developers

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

GPGPU. Peter Laurens 1st-year PhD Student, NSC

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

CS 354R: Computer Game Technology

Shaders. Slide credit to Prof. Zwicker

Spring 2009 Prof. Hyesoon Kim

NVIDIA Tools for Artists

Real-Time Shadows. Last Time? Textures can Alias. Schedule. Questions? Quiz 1: Tuesday October 26 th, in class (1 week from today!

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Per-Face Texture Mapping for Realtime Rendering. Realtime Ptex

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

White Paper. Solid Wireframe. February 2007 WP _v01

CS452/552; EE465/505. Clipping & Scan Conversion

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Wed, October 12, 2011

3D Rendering Pipeline

Fast Tessellated Rendering on Fermi GF100. HPG 2010 Tim Purcell

Engine Development & Support Team Lead for Korea UE4 Mobile Team Lead

Transcription:

Introduction to the Direct3D 11 Graphics Pipeline Kevin Gee - XNA Developer Connection Microsoft Corporation 2008 NVIDIA Corporation.

Direct3D 11 focuses on Key Takeaways Increasing scalability, Improving the development experience, Extending the reach of the GPU, Improving Performance. Direct3D 11 is a strict superset of D3D 10 & 10.1 Adds support for new features Start developing on Direct3D 10/10.1 today Available on Windows Vista & future Windows operating systems Supports 10 / 10.1 level hardware

Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Character Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches ) Sub-D Modeling Animation Displacement Map Polygon Mesh Generate LODs

Character Authoring (Cont d) Trends Denser meshes, more detailed characters ~5K triangles -> 30-100K triangles Complex animations Result Animations i on polygon mesh vertices more costly Integration in authoring pipeline painful Larger memory footprints causing painful I/O issues Solution Use the higher-level surface representation longer Animate control cage (~5K vertices) Generate displacement & normal maps

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Direct3D 10 pipeline Plus Three new stages for Tessellation Geometry Shader Stream Output Rasterizer Pixel Shader Output Merger

Hull Shader (HS) HS input: patch control pts Hull Shader One Hull Shader invocation per patch HS output: Patch control pts after Basis conversion HS output: TessFactors (how much to tessellate) fixed tessellator mode declarations Tessellator Domain Shader

Fixed-Function Tessellator (TS) Note: Tessellator does not see control points Hull Shader TS input: TessFactors (how much to tessellate) fixed tessellator mode declarations Tessellator operates per patch TS output: U V {W} domain points Tessellator Domain Shader TS output: topology (to primitive assembly)

Domain Shader (DS) Hull Shader DS input: Tessellator control points TessFactors DS input: U V {W} domain points Domain Shader One Domain Shader DS output: invocation per point from Tessellator one vertex

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Pixel Shader Output Merger Stream Output D3D11 HW Feature D3D11 Only Fundamental primitive is patch (not triangle) Superset of Xbox 360 tessellation

Example Surface Processing Pipeline Single-pass process! vertex shader hull shader tessellator domain shader Animate/skin Control Points Transform basis, Determine how much to tessellate Tess Factors Tessellate! Evaluate surface including displacement patch control points transformed control points control points in Bezier patch U V {W} domain points displacement map Sub-D Patch Bezier Patch

New Authoring Pipeline (Rocket Frog Taken From Loop &Schaefer, "Approximating Catmull-Clark Subdivision Surfaces with Bicubic Patches ) Sub-D Modeling Animation Displacement Map Optimally Tessellated Mesh GPU

Provides Tessellation: Summary Smooth silhouettes Richer animations for less Scale visual quality across hardware configurations Supports performance improvements Coarse model = compression, faster I/0 to GPU Cheaper skinning and simulation Improve pixel shader quad utilization Scalable rendering for each end user s hardware Render content as artists intend it!

Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

GPGPU & Data Parallel Computing GPU performance continues to grow Many applications scale well to massive parallelism without tricky code changes Direct3D is the API for talking to GPU How do we expand Direct3D to GPGPU?

Direct3D 11 Pipeline Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Pixel Shader Output Merger Stream Output Direct3D 10 pipeline Plus Three new stages for Tessellation Plus Data Structure Compute Shader Compute Shader

Integration with Direct3D Fully supports all Direct3D resources Targets graphics/media data types Evolution of DirectX HLSL Graphics pipeline updated to emit general data structures which can then be manipulated by compute shader And then rendered by Direct3D again

Example Scenario Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Render scene Write out scene image Use Compute for image post-processing Output final image Geometry Shader Stream Output Rasterizer Pixel Shader Output Merger Data Structure Compute Shader

Target Applications Image/Post processing: Image Reduction Image Histogram Image Convolution Image FFT A-Buffer/OIT Ray-tracing, radiosity, etc. Physics AI

Compute Shader: Summary Enables much more general algorithms Transparent parallel processing model Full cross-vendor support Broadest possible installed base

Overview Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

D3D11 Multithreading Goals Asynchronous resource loading Upload resources, create shaders, create state objects in parallel Concurrent with rendering Multithreaded draw & state submission Spread out render work across many threads Limited support for per-object display lists

Devices and Contexts D3D device functionality now split into three separate interfaces Device, Immediate Context, Deferred Context Device has free threaded resource creation Immediate Context t is your single primary device for state, draws, and queries Deferred Contexts are your per thread devices Deferred Contexts are your per-thread devices for state & draws

D3D11 Interfaces Render Thread Load Thread 1 Load Thread 2 Immediate Context DrawPrim DrawPrim DrawPrim DrawPrim Device CreateTexture CreateShader CreateVB CreateTexture CreateShader CreateShader CreateIB CreateVB DrawPrim

Async Resources Use the Device interface for resource creation All functions are free threaded Uses fine-grained sync primitives Resource upload and shader compilation can happen concurrently

State & Draw Submission First priority: multithreaded submission Single-use display lists Lower priority: per-object display lists Multiple l reuse D3D11 display lists are immutable

D3D11 Interfaces Immediate Deferred Deferred Context Context Context Display List Display List DrawPrim DrawPrim DrawPrim Execute Execute DrawPrim DrawPrim DrawPrim DrawPrim DrawPrim DrawPrim

Deferred Contexts Can create many deferred contexts Each one is single threaded (thread unsafe) Deferred context generates a Display List Display List is consumed by Immediate or Deferred contexts No read-backs or downloads from the GPU Queries Resource locking Lock with DISCARD is supported on deferred contexts

D3D11 on D3D10 H/W Deferred contexts are implemented at an API-capture level Async resource creation uses coarse sync primitives No longer free threaded; thread safe though D3D10 drivers can be updated d to better support D3D11 features Will work on Windows Vista as well as future Windows releases

Overview Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Shader Issues Today Shaders getting bigger, more complex Shaders need to target wide range of hardware Optimization of different shader configurations drives shader specialization

Options: Über-shader Über-shader foo ( ) { if (m == 1) { // do material 1 } else if (m == 2) { // do material 2 } if (l == 1) { // do light model 1 } else if (l == 2) { // do light model 2 } }

Options: Über-shader One Shader to Rule them All Good: All functionality in one place Reduces state t changes at runtime One compile step Seems to be most popular coding method Bad: Complex, unorganized shaders Register usage is always worst case path

Options: Specialization Multiple specialized shaders for each combination of settings Good: Always optimal register usage Easier to target optimizations Bad: Huge number of resulting shaders Pain to manage at runtime

Combinatorial Explosion Number of Lights Number of Materia als

Solution: Dynamic Shader Linkage & OOP Introducing new OOP features to HLSL Interfaces Classes Can be used for static code Also used as the mechanism for linking specific functionality at runtime

Interfaces interface Light { float3 GetDirection(float3 eye); }; float3 GetColor();

Classes class DirectionalLight : Light { float3 GetDirection(float3 eye) { return m_direction; } float3 GetColor() { return m_color; } }; float3 m_direction; float3 m_color;

Dynamic Shader Linkage Über-shader foo ( ) { if (m == 1) { // do material 1 } else if (m == 2) { // do material 2 } if (l == 1) { // do light model 1 } else if (l == 2) { // do light model 2 } } Dynamic Subroutine Material1( ) { } Material2( ) { } Light1( ) { } Light2( ) { } foo( ) { mymaterial.evaluate( ); mylight.evaluate( ); }

In the Runtime Select specific class instances you want Runtime will inline class methods Equivalent register usage to a specialized shader Inlining is done in the native assembly Fast operation Applies to all subsequent Draw( ) calls

Overview Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Why New Texture Formats? Existing block palette interpolations too simple Results often rife with blocking artifacts No high h dynamic range (HDR) support NB: All are issues we heard from developers

Two New BC s for Direct3D11 BC6 (aka BC6H) High dynamic range 6:1 compression (16 bpc RGB) Targeting high (not lossless) visual quality BC7 LDR with alpha 3:1 compression for RGB or 4:1 for RGBA High visual quality

New BC s: Compression Block compression (unchanged) Each block independent Fixed compression ratios Multiple block types (new) Tailored to different types of content Smooth gradients vs. noisy normal maps Varied alpha vs. constant alpha Also new: decompression results must be bit-accurate with spec

Multiple Block Types Different numbers of color interpolation lines Less variance in one block means: 1 color line Higher-precision endpoints More variance in one block means: 2 (BC6 & 7) or 3 (BC7 only) color lines Lower-precision endpoints and interpolation bits Different numbers of index bits 2 or 3 bits to express position on color line Alpha Some blocks have implied 1.0 alpha Others encode alpha

Partitions When using multiple color lines, each pixel needs to be associated with a color line Individual bits to choose is expensive For a 4x4 block with 2 color lines 2 16 possible partition patterns 16 to 64 well-chosen partition patterns give a good approximation of the full set BC6H: 32 partitions BC7: 64 partitions, shares first 32 with BC6H

Example Partition Table A 32-partition table for 2 color lines

Comparisons Orig BC3 Orig BC7 Abs Error

Comparisons Orig BC3 Orig BC7 Abs Error

Comparisons HDR Original at given exposure Abs Error BC6 at given exposure

Overview Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Lots of Other Features Addressable Stream Out Draw Indirect Pull-model attribute eval Improved Gather4 Min-LOD texture clamps 16K texture limits Required 8-bit subtexel, submip filtering precision Conservative odepth 2 GB Resources Geometry shader instance programming model Optional double support Read-only depth or stencil views

Overview Drilldown Tessellation Compute Shader Multithreading Outline Dynamic Shader Linkage Improved Texture Compression Quick Glance at Other Features Availability

Questions?

2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.