Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA

Similar documents
The Application Stage. The Game Loop, Resource Management and Renderer Design

DirectX 11 First Elements

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

Technical Report. Mesh Instancing

EECS 487: Interactive Computer Graphics

Order Matters in Resource Creation

Working with Metal Overview

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Vulkan Multipass mobile deferred done right

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Shaders in Eve Online Páll Ragnar Pálsson

Rendering Grass with Instancing in DirectX* 10

Could you make the XNA functions yourself?

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Vulkan (including Vulkan Fast Paths)

ADVANCED RENDERING WITH DIRECTX 12

PowerVR Hardware. Architecture Overview for Developers

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics

Maximizing Face Detection Performance

15 Sharing Main Memory Segmentation and Paging

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Direct3D 11 Performance Tips & Tricks

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Optimisation. CS7GV3 Real-time Rendering

GPU Memory Model. Adapted from:

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

GDC 2014 Barthold Lichtenbelt OpenGL ARB chair

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Performance Analysis and Culling Algorithms

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 25: Board Notes: Threads and GPUs

PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE

16 Sharing Main Memory Segmentation and Paging

Optimizing Mobile Games with ARM. Solo Chang Staff Applications Engineer, ARM

PowerVR Series5. Architecture Guide for Developers

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Profiling and Debugging Games on Mobile Platforms

Efficient and Scalable Shading for Many Lights

Per-Face Texture Mapping for Realtime Rendering. Realtime Ptex

In the previous presentation, Erik Sintorn presented methods for practically constructing a DAG structure from a voxel data set.

Understanding M3G 2.0 and its Effect on Producing Exceptional 3D Java-Based Graphics. Sean Ellis Consultant Graphics Engineer ARM, Maidenhead

AMD RADEON GFX & DIRECTX 12 GRAPHICS CORE NEXT BETTER PREPARED FOR DIRECTX 12 ROBERT HALLOCK AMD TECHNICAL MARKETING APPROVED FOR ALL AUDIENCES

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Current Trends in Computer Graphics Hardware

Mobile 3D Devices. -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP

Enabling immersive gaming experiences Intro to Ray Tracing

Dynamic Ambient Occlusion and Indirect Lighting. Michael Bunnell NVIDIA Corporation

Metal for OpenGL Developers

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Ultimate Graphics Performance for DirectX 10 Hardware

Mobile HW and Bandwidth

Raise your VR game with NVIDIA GeForce Tools

CS399 New Beginnings. Jonathan Walpole

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

PowerVR Performance Recommendations The Golden Rules. October 2015

Broken Age's Approach to Scalability. Oliver Franzke Lead Programmer, Double Fine Productions

Cheap and Dirty Irradiance Volumes for Real-Time Dynamic Scenes. Rune Vendler

Hidden surface removal. Computer Graphics

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Deus Ex is in the Details

Computer Graphics. Lecture 9 Hidden Surface Removal. Taku Komura

Shadows & Decals: D3D10 techniques from Frostbite. Johan Andersson Daniel Johansson

CMSC427 Advanced shading getting global illumination by local methods. Credit: slides Prof. Zwicker

! Readings! ! Room-level, on-chip! vs.!

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

ECE 574 Cluster Computing Lecture 16

The Witness on Android Post Mortem. Denis Barkar 3 March, 2017

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Vulkan API 杨瑜, 资深工程师

Practical Rendering And Computation With Direct3D 11 Free Pdf Books

Achieving High-performance Graphics on Mobile With the Vulkan API

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Introduction to Multithreaded rendering and the usage of Deferred Contexts in DirectX 11

Explicit Multi GPU Programming with DirectX 12. Juha Sjöholm Developer Technology Engineer NVIDIA

Soft Particles. Tristan Lorach

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

H.264 Decoding. University of Central Florida

Moving Mobile Graphics Advanced Real-time Shadowing. Marius Bjørge ARM

Skinned Instancing. Bryan Dudash

POWERVR MBX. Technology Overview

Sorting and Searching. Tim Purcell NVIDIA

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Transcription:

Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA

Ground Rules No DX9 Need to move fast Big topic in 30 minutes Assuming experienced audience Everything is a tradeoff These are suggestions if an app is SW limited

Why care about overhead? Overhead translates to draws per second Everyone wants more draws More detailed/interesting world Further draw distance More shadows

Do I have a problem? Optimizing the wrong thing is harmful All games are CPU limited somewhere Also likely GPU-bound somewhere Need to find the tradeoff for your game Genre/style can influence balance Online / strategy can be worse

SetVB SetSRV SetCB Draw SetVB SetSRV SetCB Draw Game / Driver Interaction Game Thread Driver Thread

Game/Driver Interaction Driver thread does most real API work Not easily visible in app threads App threads just queue commands Typically very fast Deferred context is slightly different Driver work is done on app thread Work is kept minimal though

State of the World 5 Million draws / second is feasible >50k draws / scene 290K draws / second w/ state changes Change most relevant state Drawing is cheap Object references are expensive

Full Draw Call Ctx->IASetInputLayout( ); Ctx->IASetVertexBuffers( ); Ctx->IASetIndexBuffer( ); Ctx->VSSetShader( ); Ctx->Map( ); Ctx->VSSetConstantBuffers( ); Ctx->PSSetShader( ); Ctx->Map( ); Ctx->PSSetConstantBuffers( ); Ctx->PSSetShaderResources( ); Ctx->PSSetSamplers( ); Ctx->DrawIndexed( );

Cost of the Full Draw Call Binding 5+ GPU memory objects Vertices, Indices, Constants, Textures Two memory management operations Map + DISCARD Whole bunch of indirections and cache misses

User costs for Full draw call ~14 COM calls Copying of data tied to buffer update Gathering of any data needed to make the calls Map probably looks the most expensive Requires driver to provide an immediate response

Driver costs for a Full draw Every object ref potentially requires Pointer indirection Object lifetime management Object residency management Map + Discard requires Rename active object Manage memory pool

What can be done? Parallelize the load Reduce non-draw functions Reduce draw calls Seems counter-intuitive to our goals

Deferred Contexts Can allow the load to be spread out Biggest help is Map related to constants 30-50% faster isn t unreasonable On drivers that support them natively Not a panacea as the driver thread still does a lot of serial work

Reducing Draw Setup Übershaders Optimize Constant Buffers Sub-allocate Vertex and Index Buffers Managing SRVs Managing Samplers

Ubershaders Changing shaders snowballs costs Shader is connected to everything Conditionals are fairly cheap if (hasspec) / for( numlayers ) Can have a negative impact on GPU costs Secondary factors like register pressure

Reducing Constant Buffer Costs Per-Frame Constants Per-View Constants Per-Draw Constants Pixel Shader Vertex Shader

Shared Constant Buffers DX 11.1 PSSetConstantBuffers1( // Parameters from older methods UINT StartSlot, UINT NumBuffers, ID3DBuffer *const *Buffers, // Offset in number of constants (16 bytes each) const UINT* FirstConstant, ) // Size of the block in constants ( Num % 16 == 0) const UINT* NumConstants

Shared Constant Buffers DX 11 // Vertex Shader // VS constants at the beginning float4x4 WVP_matrix : packoffset( c0 ); float4x4 W_matrix : packoffset( c4 ); // Pixel Shader // PS constants later float4 Ambient : packoffset( c16 );

Vertex and Index Suballocation Simple concept Multiple objects can share the same buffer Most games standardize vertex formats Handful of configurations DrawIndexed offers BaseIndex/BaseVertex Vastly cuts down on # of IASetXXX calls

SRV Management Basics Assign slots for global textures Shadow maps, environment maps Don t Clean-up slots Binding NULL does have a cost Group SRVs that change together Albedo and normal

SRV Management Advanced Suballocate from texture arrays Sizes and format are typically quasi-standard Terrain tiles are a fantastic example Just need to add an extra index to CB Need to take some care Allocating the max array for all combinations probably doesn t make sense

Sampler Management Repeat SRV basic advice Samplers don t change much Fairly limited set of common choices

Revised Draw Call // Setup: // ubershader constants // texture array indices // normal constants, like transform Ctx->Map( ); // Don t need to rebind CBs // Offsets in multiuse IB/VB // StartIndexLocation -> offset to IB // VertexOffset -> offset for vertices Ctx->DrawIndexed( );

DX 11.1 is better still Use one Map() for batch of draw calls Change offset into constant buffer per draw

Reducing draw calls You ve heard it for years Instancing Nearly every game could benefit Same model drawn N times in a frame There are practical concerns Depth sorting Constant buffer size

Wrap-up Analyze your situation Consider API options Get the engine out of the way See appendix slides online

Thanks NVIDIA Devtech team NVIDIA D3D Driver Performance Team Bryan Dudash Dan Baker