Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

Similar documents
Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Rendering Grass with Instancing in DirectX* 10

Using Tasking to Scale Game Engine Systems

Intel Setup and Configuration Service. (Lightweight)

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Software Occlusion Culling

Intel Desktop Board DZ68DB

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000

Intel G31/P31 Express Chipset

Increase your FPS. with CPU Onload Josh Doss. Doug McNabb.

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

SELINUX SUPPORT IN HFI1 AND PSM2

Soft Particles. Tristan Lorach

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

Intel Platform Administration Technology Quick Start Guide

Jomar Silva Technical Evangelist

Increase your FPS with CPU Onload

Intel AppUp SM developer program and Native Apps

Graphics Pass-through with VT-d

Intel Stereo 3D SDK Developer s Guide. Alpha Release

Efficient scaling in a Task-Based Game Engine

Intel X48 Express Chipset Memory Controller Hub (MCH)

Migration Guide: Numonyx StrataFlash Embedded Memory (P30) to Numonyx StrataFlash Embedded Memory (P33)

Intel Desktop Board DH61SA

LED Manager for Intel NUC

Intel Cache Acceleration Software for Windows* Workstation

OIT to Volumetric Shadow Mapping, 101 Uses for Raster Ordered Views using DirectX 12

Clear CMOS after Hardware Configuration Changes

Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics

Android on Everything! Smooth Development of Cross-platform Native Android Games

Ultrabook Convertible Application Design Considerations

Intel Desktop Board D975XBX2

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

Intel Desktop Board D945GCLF2

Intel 945(GM/GME)/915(GM/GME)/ 855(GM/GME)/852(GM/GME) Chipsets VGA Port Always Enabled Hardware Workaround

Intel Setup and Configuration Service Lite

Intel Desktop Board DP67DE

White Paper. Texture Arrays Terrain Rendering. February 2007 WP _v01

Intel Serial to Parallel PCI Bridge Evaluation Board

How to Create a.cibd File from Mentor Xpedition for HLDRC

Drive Recovery Panel

Performance OpenGL Programming (for whatever reason)

How to Create a.cibd/.cce File from Mentor Xpedition for HLDRC

Intel Desktop Board D945GCLF

Dynamic Resolution Rendering

Intel Desktop Board D945GCCR

Spring 2009 Prof. Hyesoon Kim

Intel Desktop Board DH61CR

CS451Real-time Rendering Pipeline

Version 1.0. Intel-powered classmate PC Arcsoft WebCam Companion 3* Training Foils. *Other names and brands may be claimed as the property of others.

Spring 2011 Prof. Hyesoon Kim

Intel Desktop Board D946GZAB

Forging a Future in Memory: New Technologies, New Markets, New Applications. Ed Doller Chief Technology Officer

Intel Desktop Board DG41CN

INTEL PERCEPTUAL COMPUTING SDK. How To Use the Privacy Notification Tool

Could you make the XNA functions yourself?

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Com S 336 Final Project Ideas

Technical Report. Anisotropic Lighting using HLSL

Intel IXP400 Digital Signal Processing (DSP) Software: Priority Setting for 10 ms Real Time Task

Boot Agent Application Notes for BIOS Engineers

Intel 848P Chipset. Specification Update. Intel 82848P Memory Controller Hub (MCH) August 2003

Intel Desktop Board DG31PR

Innovating and Integrating for Communications and Storage

Intel Desktop Board D915GUX Specification Update

Intel Desktop Board D915GEV Specification Update

Intel Atom Processor Based Platform Technologies. Intelligent Systems Group Intel Corporation

Intel Thread Checker 3.1 for Windows* Release Notes

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

KVM for IA64. Anthony Xu

BIOS Update Release Notes

Intel Desktop Board DG41RQ

Intel Desktop Board DP55SB

Installation Guide and Release Notes

Intel Desktop Board DH55TC

The Rasterization Pipeline

Intel vpro Technology Virtual Seminar 2010

The Application Stage. The Game Loop, Resource Management and Renderer Design

2.11 Particle Systems

CS 354R: Computer Game Technology

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Intel Atom Processor E3800 Product Family Development Kit Based on Intel Intelligent System Extended (ISX) Form Factor Reference Design

Intel Cache Acceleration Software - Workstation

Intel Virtualization Technology Roadmap and VT-d Support in Xen

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

How to Configure Intel X520 Ethernet Server Adapter Based Virtual Functions on SuSE*Enterprise Linux Server* using Xen*

Intel Desktop Board D102GGC2 Specification Update

2013 Intel Corporation

Making Nested Virtualization Real by Using Hardware Virtualization Features

Intel Desktop Board DQ57TM

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Upgrading Intel Server Board Set SE8500HW4 to Support Intel Xeon Processors 7000 Sequence

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Intel Education Theft Deterrent Release Note WW16'14. August 2014

Intel 82580EB/82580DB GbE Controller Feature Software Support. LAN Access Division (LAD)

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Technical Report. Mesh Instancing

Installation Guide and Release Notes

Transcription:

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs Steve Hughes: Senior Application Engineer - Intel www.intel.com/software/gdc Be Bold. Define the Future of Software.

The unspoken problems in Graphics How do I sort alpha polygons quickly? How do I sort alpha polygons correctly? How do I achieve programmable blending in DirectX? How do I transfer data to and from GPU memory efficiently? 2

Why do I mean by alpha polygons? Lets imagine in our game we have these 4 polygons. They are in our 3D scene, and this is what we expect to see. Viewing Direction A Viewing Direction B View from direction A View from Direction B 3

How do we make that happen? They have to be applied to the screen in back to front order otherwise Viewing Direction Draw Order Draw Order 4

What about the z-buffer? If you use the z-buffer you don t get all your pixels Viewing Direction Draw Order Draw Order 5

Is there a right way to do this at all? Yes, you sort them back to front and draw them without z after all of the opaque polygons in the scene. Sort is the key word Viewing Direction Ok, so they look right - sorting 4 polygons isn't too expensive though, is it? Draw Order 6

A more realistic example! High poly count tree: All leaf models are in the same vertex buffer All leaf edges have Alpha values Tree is animated Need to re sort every frame to draw correctly Very expensive to get right! 7

Finally lets look at some pictures Screenshot from Rome 2: Total War From GDC Rome2: 2013 Total War. Showing complexity of modern transparency passes Reproduced by permission from Creative Assembly 8

So, do we have to sort millions of poly s every frame??? Viewing Direction Group 7 Group 3 Group 2 Group 1 Group 5 Group 4 Group 10 Well, no. Poly s split into groups Sorted within the group Then sort the groups Does that work? Plan view, showing groups of transparent polygons Trees, fires, shrubs, smoke areas, windows, anything 9

And, does it work? Viewing Direction Group 3 Group 2 Group 7 Well, no. It does as long as you take steps to keep groups apart. 2 things go wrong here: One group will always draw over another Polygons will intersect others When Polygons Intersect (Discovery Channel, Friday, 9pm) 10

A closer look at intersecting polygons Correct render order Different correct render order No sort plan will ever let you sort the pixels in these 4 polygons correctly! 11

Does that ever happen, really? Go on, ask an artist! 12

Pause for thought Scene sorting doesn t work. Its also expensive. This is an old, old problem. Can anything help How about Unordered Access Views (UAVs) and Structured Buffers? 13

How about Unordered Access Views and Structured Buffers? What are they? Structured Buffer New resource type to DX11. Each entry is a fixed size structure defined by the user. Unordered Access View (UAV) View of a buffer which allows scatter / gather operations hence unordered Shader can write to any location. How does that help us? 14

2 pass proposition for solving the alpha sort problem: Pass 1 pixel shader Stores first n color/alpha/z triplets in the buffer. Stores remaining colors by merging the triplet into one of first n colors as appropriate. Buffer data Per screen loc RGB A 0.7 0.3 0.9 0.5 Z 0.1 0.2 0.6 0.8 Pass 2 pixel shader Pixel Shader Sorts layers, then blends them onto render target + + + Pixel 123, 67 Screen loc 123, 67 In buffer After the first n colors, successive colors are adaptively merged into one of the first n. Great example of how to do this here: http://software.intel.com/en-us/articles/adaptive-transparency 15

Why won t that work?: 1. Pixels arrive in random order (it s a UAV, right?) after the first 4, pixels can be merged into the buffer in an inconsistent order on successive frames. 2. We seem to think we can read / modify / write the render target repeatedly Spoiler alert: Algorithm will result in temporal artifacts lets see why 16

Our input comes from an array of EUs Input pixels EU s (Execution Units) going flat out! EU EU EU EU EU EU EU No guaranteed order for UAV access events. UAV access occurs in random order, so triplets after the first n will be merged into different slots, so we don t get the same image every frame. Net result: Nice idea then, but it won t work as-is 17

Enter Pixel Shader Ordering to save the day! Green threads writing to unique pixels Executed as normal for UAV Pipeline unchanged, all executed is in parallel. Blue and red threads writing to same pixel In this case Red thread issued before blue Original render order reapplied Determinacy guaranteed. Pixel Shader execution on multiple EUs Pixel Shader Ordering Only specific colliding threads are serialized to minimize pipeline interference. 18

How do you order a UAV? Some new HLSL functions. IntelExt_Init() o Initializes the interface to the extensions. IntelExt_BeginPixelShaderOrdering(); o All shader access to any UAV after the call will be serialized in original primitive order 19

How to Composite Transparent and Opaque Geometry Traditional blend math using Alpha and (1 Alpha) R*(1-a1)+c1*a1 (R*(1-a1)+c1*a1)*(1-a2)+c2*a2 ((R*(1-a1)+c1*a1)*(1-a2)+c2*a2)*(1-a3)+c3*a3 Which we can reshuffle to give us this: Alpha = (1-a1)*(1-a2)*(1-a3) Color = c1*a1*(1-a2)*(1-a3)+c2*a2*(1-a3)+c3*a3 Which we can blend like this: SrcBlend will be D3D11_BLEND_ONE DstBlend will be D3D11_BLEND_SRC_ALPHA BlendOp will be D3D11_BLEND_OP_ADD Allowing us to blend our multiple Color + Alpha pairs to the render target In a single blend op. 20

About, Blend modes. From DX Specifications: In reality, we have 3 parameters to the blend pipeline: SrcBlend What to scale the value in the shader by DstBlend what to scale the value in the render target by BlendOp how to combine the 2 results The important thing here is that this is all color blending. If you want to get more complex, you cant 21

Deferred rendering 101... Geometry Buffer (G-Buffer): All scene models rendered to an intermediate buffer (called a G-Buffer) G-Buffer pixels contain normals, materials, texture information etc. in screen space. A second pass composites all this data with light data and shadow data to the screen. Well documented and well used. Unanswered deferred rendering problem: -How do I add a decal of, say, a bullet hole onto the G-Buffer? 22

Using Normal Decals on deferred surfaces To apply a bullet hole or an axe mark simply Render your G-Buffer Take a normal map of a bullet hole Blend it with the G-Buffer Result will be a correctly mapped bullet hole G-buffer Normal data But you can t do that with the blend stage in D3D! 23

Blending Normals Solved! You can do it with a structured buffer and a UAV, if you use Pixel Shader Ordering to avoid clashes. Normals can be safely blended together creating Correct surface deformations. Potentially, this solves it completely! Time to think The entries in the structured buffer do not need to be colors. We could store normals there, or anything else for that matter! 24

So far, then How do I sort alpha polygons quickly? - DONE How do I sort alpha polygons correctly? - DONE How do I achieve programmable blending in DirectX? - DONE How do I transfer data to and from GPU memory efficiently? 25

What do we mean, transfer data quickly? To upload a texture, buffer, anything else System Ram CPU DirectX Driver PCI Express Bus GPU GPU Ram Dictates load times Restricts per frame upload & download volumes Physically limits algorithm development 26

Realisation System Ram CPU DirectX Driver PCI Express Bus GPU GPU Ram Wait a minute On processor graphics systems, these are the same thing Hmm, try telling DirectX that! 27

Redesigning the house Nice fat pipe in here Send the actual address of the buffer to the CPU This is over there on the CPU System Ram CPU DirectX Driver PCI Express Bus GPU GPU Ram Knock a hole through this Pop a bit of code in here Completely ignore this 28

The Instant Access extension. CreateSharedTexture2D( LPD3D11Device, D3D11_TEXTURE2D_DESC, LPD3D11Texture2D *gputex, LPD3D11Texture2D *cputex) CopyResource(cputex, gputex); Helper function to create a shared texture. gputex will be its name in all things GPU side, binding etc cputex will be its name on CPU side map etc CopyResource() command trapped by driver Links the textures together Both refer to the same memory location 29

How do you detect the extensions? Include the helper library, then CAPS_EXTENSION intelextcaps; ZeroMemory( &intelextcaps, sizeof(caps_extension) ); if( S_OK!= GetExtensionCaps( pd3ddevice, &intelextcaps ) ) return E_FAIL; if( intelextcaps.driverversion < EXTENSION_INTERFACE_VERSION_1_0 ) return E_FAIL; return S_OK; 30

Concluding... We asked: How do I sort alpha polygons quickly? You can t get much quicker than not doing it, sure there is a little bit of extra work in the shader. We asked: How do I sort alpha polygons correctly? We found out there was no solution to some issues, until we changed the game with Pixel Shader Ordering. Then we asked: How do I achieve programmable blending in DirectX? And we looked at one trick involving Pixel Shader Ordering which will change the game. Finally, we asked: How do I transfer data to and from GPU memory efficiently? We found out that the Instant Access extension lets us copy to and from GPU ram very quickly. 31

And finally Always check out http://software.intel.com/sites/billboard/ to see what's new Go check out the Intel Booth some of this stuff is being demonstrated there. Lots of other cool stuff there too! Don t forget to fill out the questionnaire. 32

Questions? 33

Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright 2012, Intel Corporation. All rights reserved. 34

35