Designing a Modern GPU Interface

Similar documents
The Application Stage. The Game Loop, Resource Management and Renderer Design

Could you make the XNA functions yourself?

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

A Trip Down The (2011) Rasterization Pipeline

Achieving High-performance Graphics on Mobile With the Vulkan API

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Rendering Grass with Instancing in DirectX* 10

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Lab 3 Shadow Mapping. Giuseppe Maggiore

Shaders. Slide credit to Prof. Zwicker

Shaders (some slides taken from David M. course)

Working with Metal Overview

EECS 487: Interactive Computer Graphics

Dave Shreiner, ARM March 2009

Engine Development & Support Team Lead for Korea UE4 Mobile Team Lead

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Pipeline Operations. CS 4620 Lecture 14

PowerVR Hardware. Architecture Overview for Developers

OpenGL ES 2.0 : Start Developing Now. Dan Ginsburg Advanced Micro Devices, Inc.

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

CS4621/5621 Fall Computer Graphics Practicum Intro to OpenGL/GLSL

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Cg 2.0. Mark Kilgard

DirectX Programming #4. Kang, Seongtae Computer Graphics, 2009 Spring

Lecture 25: Board Notes: Threads and GPUs

Metal for OpenGL Developers

Introduction to Shaders.

CS427 Multicore Architecture and Parallel Computing

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Understanding M3G 2.0 and its Effect on Producing Exceptional 3D Java-Based Graphics. Sean Ellis Consultant Graphics Engineer ARM, Maidenhead

OUTLINE. Learn the basic design of a graphics system Introduce pipeline architecture Examine software components for a graphics system

COMP371 COMPUTER GRAPHICS

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

The Rasterization Pipeline

Evolution of GPUs Chris Seitz

Graphics Programming. Computer Graphics, VT 2016 Lecture 2, Chapter 2. Fredrik Nysjö Centre for Image analysis Uppsala University

Module 13C: Using The 3D Graphics APIs OpenGL ES

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Content. Building Geometry Appearance Lights Model Loaders

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

PowerVR Series5. Architecture Guide for Developers

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Optimisation. CS7GV3 Real-time Rendering

GDC 2014 Barthold Lichtenbelt OpenGL ARB chair

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Introduction to the Direct3D 11 Graphics Pipeline

Spring 2011 Prof. Hyesoon Kim

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Introduction. What s New in This Edition

Programming Graphics Hardware

OpenGL Status - November 2013 G-Truc Creation

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Porting Roblox to Vulkan. Arseny

Why modern versions of OpenGL should be used Some useful API commands and extensions

Programming shaders & GPUs Christian Miller CS Fall 2011

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

last time put back pipeline figure today will be very codey OpenGL API library of routines to control graphics calls to compile and load shaders

Canonical Shaders for Optimal Performance. Sébastien Dominé Manager of Developer Technology Tools

GPU Memory Model. Adapted from:

Introduction to SPIR-V Shaders

Lecture 2. Shaders, GLSL and GPGPU

12.2 Programmable Graphics Hardware

Spring 2009 Prof. Hyesoon Kim

Course Recap + 3D Graphics on Mobile GPUs

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimal Shaders Using High-Level Languages

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Shader Programming and Graphics Hardware

Shaders in Eve Online Páll Ragnar Pálsson

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Lecture 13: OpenGL Shading Language (GLSL)

Technical Report. Mesh Instancing

POWERVR MBX. Technology Overview

PowerVR Performance Recommendations. The Golden Rules

DEVELOPER DAY. Vulkan Subgroup Explained Daniel Koch NVIDIA MONTRÉAL APRIL Copyright Khronos Group Page 1

DirectX 11 First Elements

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Rendering Objects. Need to transform all geometry then

Vulkan API 杨瑜, 资深工程师

WebGL and GLSL Basics. CS559 Fall 2015 Lecture 10 October 6, 2015

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

C P S C 314 S H A D E R S, O P E N G L, & J S RENDERING PIPELINE. Mikhail Bessmeltsev

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

CS 4620 Program 3: Pipeline

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Shader Programming CgFX, OpenGL 2.0. Michael Haller 2003

Transcription:

Designing a Modern GPU Interface Brooke Hodgman ( @BrookeHodgman) http://tiny.cc/gpuinterface

How to make a wrapper for D3D9/11/12, GL2/3/4, GL ES2/3, Metal, Mantle, Vulkan, GNM & GCM without going (completely) insane Brooke Hodgman ( @BrookeHodgman) http://tiny.cc/gpuinterface

Agenda GPU Interface wrapper around the native GPU APIs for every platform Pipeline State management Resource management Shader program management (e.g. Microsoft.fx or Nvidia.cgfx) Q&A

Where does this fit in? Shading Pipeline Deferred/Forward shading, post-processing, order of passes, high level techniques Scene Manager Spatial partitioning, Camera management, object culling Game Engine Specific Drawable Types Generic models, Particle systems, Animated meshes, Instanced meshes Today! GPU Interface Lowest level portable rendering API. Just a GPU abstraction.

Goals Flexibility Can do anything that the native APIs let us do. No cutting out features. Productivity Much simpler to use than the native APIs. Less code, and less mental tax. Performance Similar CPU frame-time to hand-written native code. Simplicity Keep the interface as small as possible.

Dog food test 22 PC, PS4, Xbox One Don Bradman Cricket (port) PS4, Xbox One Rugby League Live 3 Steam, PS4, PS3, Xbox One, Xbox 360

The GPU pipeline 2005-2015 SM2 Draw: Input Assembler Vertex Shader Rasterizer Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers)

The GPU pipeline 2005-2015 + Vertex texture fetch SM3 Draw: Input Assembler Vertex Shader Rasterizer Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers)

The GPU pipeline 2005-2015 + Geometry Shader & Stream Out stage + Compute shaders SM4 Draw: Input Assembler Vertex Shader Geometry Shader Rasterizer Stream Out Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

The GPU pipeline 2005-2015 + Read-Write resources (UAVs) at pixel shader + Tessellation stages SM5 Draw: Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Stream Out Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

The GPU pipeline 2005-2015 + Read-Write resources at every stage SM5+ Draw: Input Assembler Vertex Shader Hull Shader Tessellator Domain Shader Geometry Shader Rasterizer Stream Out Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

The GPU pipeline 2005-2015 Most common features Draw: Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger Resources / Memory (Textures, Buffers, Samplers) Dispatch: Compute Shader

The GPU pipeline 2005-2015 Most common features, API view API states: Input Layout Programs Raster Depth / Stencil Blend Draw Command Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger Resource bindings: Buffer Buffer Buffer / Texture / Sampler Depth Texture Colour Texture

The GPU pipeline 2005-2015 Most common features, API view Programs Dispatch Command Compute Shader Buffer / Texture

Stateless Rendering

Native APIs are state machines Draw(3, TRIANGLES) Behaviour depends on the current state??????????????? Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger???????????????

Native APIs are state machines BindTexture( t ) plug some resources in BindVertexBuffer( v ) BindRenderTarget( r )??????????????? Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger v??? t??? r

Native APIs are state machines SetBlend( OPAQUE ) configure some fixed-function bits SetShaderProgram( s ) plug some procedures in??? s?????? OPAQUE Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger v t??? r

Native APIs are state machines SetInputLayout( l ) SetRaster( SOLID ) SetDepthTest( DISABLED ) l s SOLID DIS- ABLED OPAQUE Draw(3, TRIANGLES) Input Assembler Vertex Shader Rasterizer Stream Out Pixel Shader Output Merger v t r

State machine issues (and features) Objects can specify that they don t care about a state (by not setting it) Don t care states can be inherited from the calling logic. SetBlend( Translucent ) House.Draw() Tree.Draw()

State machine issues and features But this system of inherited state can be very fragile to code modifications. void House::Draw(){ SetBlend( Opaque ) Draw( TRIANGLES,3 ) } SetBlend( Translucent ) House.Draw() Tree.Draw() uh oh!

State machine issues and features It can also lead to inefficiencies as your graphics programmers become pessimistic. void Tree::Draw(){ SetBlend( Opaque )... } void House::Draw(){ SetBlend( Opaque )... }

Stateless Alternative Simplify the API remove the entire state machine concept! Less mental tax no worrying about leaky states Retain the flexibility of don t care states but remove the fragility that it has in state-machine APIs

Draw Items Bundle all native API state and all resource bindings together into a Draw Item. Missing / don t care states are always filled in by some form of default value. Pipeline state Input Layout Raster Depth / Stencil Blend Programs Resources Buffer Texture Sampler Primitives Draw Command

Draw Items draw_item = CreateDrawItem(... ) Submit( draw_item ) Behaviour depends only on the contents of the draw item draw_item Lay out Code Solid Code Less Opaque 3 triangles IA VS Raster PS OM VB Tex Depth Colour

Leaky states Now impossible to get state leakage. Every draw is completely independent and immune to code modifications in other drawing systems. Submit( House.GetDrawItem() ) Submit( Tree.GetDrawItem() )

State Groups Container for Pipeline States and Resource Bindings. Plain-old-data, generated by a writer object. StateGroupWriter sgw sgw.begin() sgw.bindtexture( t ) sgw.bindvertexbuffer( v ) sgw.setblend( Opaque ) sgw.setshaderprogram( s ) StateGroup* sg = sgw.end() Blend Buffer Programs Texture

State Group Stacks Allow different systems to contribute pipeline-states and resource bindings. StateGroup* mesh =... Input Layout Buffer Blend Programs StateGroup* material =... Raster Texture StateGroup* stack[] = {material, mesh}

State Overrides Stack ordering dictates priority for overrides. Placing a state-group at the front of the array causes it s values to be chosen in any state conflicts. StateGroup* mesh =... Input Layout Buffer Blend Programs StateGroup* material =... StateGroup* override =... Blend Raster Texture StateGroup* stack[] = {override, material, mesh}

State Overrides Stack ordering dictates priority for overrides. Placing a state-group at the front of the array causes it s values to be chosen in any state conflicts. override Blend material Blend Programs mesh Input Layout Buffer Raster Texture StateGroup* stack[] = {override, material, mesh}

State Defaults Stack ordering dictates priority for overrides. Placing a state-group at the back of the array causes it s values to only be chosen as a fall-back. StateGroup* mesh =... StateGroup* material =... StateGroup* defaults =... StateGroup* stack[] = {material, mesh, defaults}

State Defaults Stack ordering dictates priority for overrides. Placing a state-group at the back of the array causes it s values to only be chosen as a fall-back. material mesh defaults Blend Programs Input Layout Buffer Input Layout Blend Raster Raster Texture Depth / Stencil Programs StateGroup* stack[] = {material, mesh, defaults}

Compiling a Draw Item Given a stack and a draw command, they can be pre-compiled into a draw item. override material mesh defaults Blend Blend Programs Input Layout Buffer Input Layout Blend Raster Raster Texture Depth / Stencil Programs StateGroup* stack[] = {override, material, mesh, defaults} DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command ) Draw Command

Compiling a Draw Item Given a stack and a draw command, they can be pre-compiled into a draw item. draw override Blend Draw Command Input material Layout Blend Input Assembler Raster Buffer Programs Vertex Texture Shader Programs mesh Raster Input Buffer Layout Rasterizer Stream Out StateGroup* stack[] = {override, material, mesh, defaults} Texture DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command ) Pixel Shader Depth / defaults Stencil Input Blend Layout Raster Depth Output / Merger Programs Stencil Draw Command Blend??????

Render Passes Draw Items defined all of the pipeline state except for the Depth/Stencil Target and Render Targets. Render Passes define these destination resources, plus the default and override state groups. RenderPass* pass = CreatePass( depth, color, defaults, override ) StateGroup* stack[] = {override, material, material, mesh } mesh, defaults} DrawCommand command = { 3, TRIANGLES } DrawItem* draw = Compile( stack, command, ) pass ) DrawItem* draws[] = { draw } Submit( pass, draws )

Resource Bindings

Resource ID s (and state ID s) Similar to GL, we use small integer types to refer to resource allocations & views. No reference counting a higher level of the engine can wrap reference counting around this simple integer handle scheme if necessary (a la std::shared_ptr). Helps decouple platform-specific types from the client code. This can be a significant memory saving per compiled Draw Item Pointers are 64 bits! Most resource IDs should fit in <16 bits Some kinds of state IDs might fit in <8 bits! (how many blend modes do you really use?)

Resource slots Most resource binding points are arrays Conflicts are resolved per individual array elements override material Sampler 1 Sampler 2 Blend Programs Sampler 0 Sampler 1 StateGroup* stack[] = {override, material}

Resource slots Resource slots aren t named, only numbered? Sampler 0, Sampler 1, Sampler 2 Constant Buffer 0, Constant Buffer 1, Constant Buffer2 Using this assumption at this level of the engine greatly simplifies development. Our shader programs struct can use a sampler bitmask of 0x05 to indicate that it uses sampler slot #0 and slot #2 (i.e. ((1<<0) (1<<2)) == 0x5) The State Group conflict / merging system is built on super simple integer comparisons.

Resource slots Using numbered slots requires defining convention. Constant Buffer 0 is always used for the per-camera matrix data. Constant Buffer 1 is always used for lighting data etc. This is actually quite useful for magic engine-generated data, which always conforms to a known (hard-coded) structure. such as camera matrices, which you want to automatically plug into every object. These are also a good use for the defaults/overrides state groups!

Resource slots Using named resources requires reflection. To bind data to a named slot, simply use the shader reflection system. Check with the object s shader to discover the number that s associated with that name. This is useful for less rigidly defined structures, such as materials, which may change often during development and vary from object to object.

Input Assembler (D3D11) Binding slots: Input Layout (Formats, strides for each element) Input Assembler Index buffer (Buffer + offset) API states: Input Layout Programs Raster Rasterizer Depth / Stencil Blend Vertex buffer(s) Draw Command Resource bindings: Input Assembler Buffer Vertex Shader Stream Out Buffer Pixel Shader Depth Texture Output Merger Colour Texture (Buffer + offset) Buffer / Texture / Sampler

Input Layouts and Vertex Shaders Input layouts tell the VS where to find the vertex attributes. Stream #0 data: Stream #1 data: Position 1 Position 2 Position 3 TexCoord 1 Normal 1 TexCoord 2 Normal 2 TexCoord3 Normal 3 Offset: Stride: struct VS_Input_Full { float3 p : Position; float2 t : TexCoord; float3 n : Normal } struct VS_Input_Thin { float3 p : Position; }

Input Layouts and Vertex Shaders Lua config files define stream formats (memory layouts for buffers) and vertex formats (VS input structures). StreamFormat("example_stream", { [VertexStream(0)] = { { Float32, 3, Position }, }, [VertexStream(1)] = { { Float32, 3, Normal }, { Float32, 2, TexCoord, 0 }, }, }) VertexFormat("VS_Input_Full", { { "p", float3, Position }, { "t", float2, TexCoord, 0 }, { "n", float3, Normal }, }) InputLayout( "example_stream", "VS_Input_Full" ) InputLayout( "example_stream", "VS_Input_Thin" ) InputLayout( "simple_stream", "VS_Input_Thin" )

Input Assembler (Simplified) Binding slots: Vertex Data Index buffer (Buffer ID + offset) Vertex buffer(s) (Buffer ID + offset) Input Assembler Stream Format Input Layout (now hidden from the user) Instance Data Vertex buffer(s) (Buffer ID + offset)

Shader Resources (D3D11) Binding slots: Constant Buffer View(s) Buffer Pixel Shader Shader Resource View(s) Buffer Texture Draw Command API states: Input Layout Input Assembler Vertex Shader Programs Raster Rasterizer Stream Out repeat for other shader stages Pixel Shader Depth / Stencil Output Merger Blend Sampler(s) Unordered Access view(s) Buffer Texture Resource bindings: Buffer Buffer Depth Texture Colour Texture Buffer / Texture / Sampler

Resource Lists D3D11 allows for 128 texture slots per shader stage. Can we still allow the user to access a hundred textures without the overhead of managing a hundred binding points? How did APIs already solve this for constants / uniforms? Resource lists are constant buffers (UBOs) for texture bindings. Similar to bindless resources. Ports well to Mantle/Vulkan/D3D12 descriptor lists! Only a small number of resource list binding points required. Resource List Diffuse Map ID Normal Map ID Specular Map ID

Shader Resources (simplified) Binding slots: Constant Buffer ID(s) Resource List ID(s) Buffer ID / Texture ID Shader Stages (all) Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID)

Draw Item Resources Final size of each draw item is usually <1 cache line Resource List Buffer ID / Texture ID 2 256 bytes Draw Item Constant Buffer ID(s) Resource List ID(s) Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID) Raster ID Depth / Stencil Blend ID Program ID Input Assembler Config ID Draw Command 32 80 bytes Input Assembler Config Vertex Data Instance Data 20 128 bytes

State Group Resources Final look at actual State Group members (all optional) State Group Constant Buffer ID(s) Resource List ID(s) Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID) Raster ID Depth / Stencil Blend ID Vertex Data Instance Data Technique ID Shader Options Draw Item Program ID

Shaders

Program management Out of the box, shaders are hard to manage. One program = Pixel Shader + Vertex Shader (+Geometry + Tessellation ) Most objects/materials require more than one program. Deferred rendering write GBuffer attributes. Forward rendering compute all shading and lighting. Shadow mapping write depth only. Material LOD enable disable features (e.g. normal mapping at a distance). Loop unrolling compile the shader once for each value of N. All of these programs grouped together form a single Technique.

Techniques, Passes, Options, Permutations A technique is a single shader file (Effect in MS lingo) Each technique contains several passes Gbuffer, Forward, Depth-Only, etc Each pass can contain several options Normal Mapping (y/n), Number of lights [0..8), etc For each technique, for each pass, for each permutation of options, precompile the shader source file into a program Careful each 1-bit option doubles the number of programs!

[FX] syntax All the APIs we use (except mobile/mac/linx) use a shader language that is close enough to HLSL that we can just write all our shader code in HLSL! A header file full of #defines is enough to smooth over the small differences in syntax. However, resource declaration syntax varies widely. Not all platforms support constant buffers (We support prev-gen / D3D9 / GL2 era). Not all platforms support Resource Lists. Not all platforms support separate Textures and Samplers

[FX] syntax Small amount of code generation used to smooth over these issues. We search for comment blocks of the pattern /*[FX] */ and execute their contents as Lua code. The Lua VM has been pre-registered with functions such as below, to create a domain-specific-language for declaring shader resources and techniques/passes/options: CBuffer( slot, stages, name, values ) TextureList( slot, stages, name, values ) Option( name, range ) Pass( slot, name, parameters )

[FX] Examples CBuffer( 0, Pixel, 'Material', { { g_emissive = float }, }) TextureList( 0, Pixel, 'Material', { { Tex2D, 's_diffuse', 'Linear' }, }) Sampler(0, {Pixel,Vertex}, 'Linear', { MinFilter = Linear, MagFilter = Linear, MipFilter = Linear, AddressU = Wrap, AddressV = Wrap, AddressW = Wrap, })

[FX] Examples Pass( 0, 'Opaque', { vertexshader = 'vs_main'; pixelshader = 'ps_main'; vertexlayout = { 'VS_Input_Full' }; pixeloptions = LightCount'; })

Shader Options Shader options are all packed together into a bitmask. Option( 'NormalMapped' ) -- pick a bit for me (use reflection!) Option( 'NormalMapped', {id=3} ) -- mask == 0x8 (i.e. 1<<3) Option( 'LightCount', {id=4, min=1, max=4} ) 7654 3210 0x00 / 0000 0000 == LightCount: 1 0x10 / 0001 0000 == LightCount: 2 0x20 / 0010 0000 == LightCount: 3 0x30 / 0011 0000 == LightCount: 4

Shader Options Given a pass with: Option( 'NormalMapped', {id=0} ) Option( 'LightCount', {id=4, min=1, max=4} ) The permutations would be: 7654 3210 0x00 / 0000 0000 == NormalMapped: 0, LightCount: 1 0x01 / 0000 0001 == NormalMapped: 1, LightCount: 1 0x10 / 0001 0000 == NormalMapped: 0, LightCount: 2 0x11 / 0001 0001 == NormalMapped: 1, LightCount: 2 0x20 / 0010 0000 == NormalMapped: 0, LightCount: 3 0x21 / 0010 0001 == NormalMapped: 1, LightCount: 3 0x30 / 0011 0000 == NormalMapped: 0, LightCount: 4 0x31 / 0011 0001 == NormalMapped: 1, LightCount: 4

Program selection I lied earlier I said that a Render Pass has just a depth-texture, rendertarget(s), defaults state group and overrides state group. A Render Pass also specifies a shader pass integer. Look up the technique, then look up the right pass within the technique and then you ve got a potentially long list of permutations State Group Technique Shader ID Options Render Pass Pass ID Draw Item Program ID Step 1 Step 2 Profit!

Shader Options - runtime Conflict/merging of shader options state is implemented a little differently. State Group State Group Shader Options Technique ID U32 value U32 mask value = 0x04 mask = 0x0F State Group Merged Options = 0x84 Render Pass Pass ID value = 0x80 mask = 0xF0

Permutation selection When compiling your permutations, sort them by CountBitsSet(options_bitmask) such that permutations with more options bits set appear earlier in the array. At runtime, the user creates their own bitmask of requested features. Linearly search through the permutations list, stop when: (requested_options & permutation_options) == permutation_options i.e. stop as soon as you re not delivering options that weren t asked for. You won t necessarily be able to satisfy the user s request exactly, but this algorithm will give them the program that enables as many of their requests as possible.

Permutation Selection (code) int SelectProgramsIndex( u32 techniqueid, u32 passid, u32 featuresrequested ) { Technique& technique = techniques[techniqueid]; List<Pass>& passes = technique.passes; Pass& pass = passes[passid]; List<Permutation>& permutations = pass.permutations; for( int i = 0, end = permutations.count; i!= end; ++i ) { Permutation& permutation = permutations[i]; if( (featuresrequested & permutation.features) == permutation.features ) return permutation.bindingidx; } return -1; }

Q&A? @BrookeHodgman http://tiny.cc/gpuinterface

Thanks! @BrookeHodgman http://tiny.cc/gpuinterface

Bonus slides That I was going to write but then I didn t

GLSL notes GL + GLSL are just specifications vendors create implementations (which are all broken) Validate your shaders using the Khronos reference compiler*. Don t ship your source files. Implement a pre-processor for #include, etc. Obfuscate your shipping code if you feel the need. No guarantees that every vendor will optimize (or compile) your code properly! Implement a GLSL->AST->GLSL optimizing compiler. Or better: a HLSL->AST->GLSL optimizing compiler! Automate this! *http://tiny.cc/khronos

Draw sorting Write a function that hashes a compiled Draw Item. More expensive state changes should be associated with more significant bits in the output. Draw Item IA Config Constant Buffer ID(s) Raster ID Blend ID Shader & pipeline state Textures Resource List ID(s) Sampler ID(s) Unordered Access view(s) (Buffer ID / Texture ID) Depth / Stencil Program ID Input Assembler Config ID Draw Command Hash 0x12345678 Sorting key

Transparent Draw sorting Alpha-blended geometry must be rendered from back to front. Don t use the draw item s hash, use it s distance from the camera. Distance Depth ~*(u32*)distance 0xABCDEF12 Sorting key

Hybrid Draw sorting For opaque geometry to make use of Hi-Z, you want to render front-to-back. However, you also want to sort by state to reduce CPU costs. Compromise by using a hybrid Distance Coarse Depth Original Hash Merge 0xABCD1357 0x12345678 New sorting key Original Sorting key

Redundant state filtering Each draw item is a very compact structure, containing state IDs. XOR ing two draw items creates a bitmask that highlights any changes. Masking out sections of that bitmask and comparing them to zero lets you quickly check if a state has changed since the previous draw item.

Resource Management

Data conditioning / compilation

Shader compilation fun

Devices, contexts & command lists

Devices

Contexts

Multithreading on old APIs

Higher level layer examples

Scene manager

Materials

Lighting