New GPU Features of NVIDIA s Maxwell Architecture

Similar documents
A Trip Down The (2011) Rasterization Pipeline

Direct3D 11 Performance Tips & Tricks

Advanced Ambient Occlusion Methods for Modern Games

After the release of Maxwell in September last year, a number of press articles appeared that describe VXGI simply as a technology to improve

Rendering Grass with Instancing in DirectX* 10

ADVANCED RENDERING WITH DIRECTX 12

HIGH-QUALITY RASTERIZATION CHRIS WYMAN SENIOR RESEARCH SCIENTIST, NVIDIA

NVIDIA Parallel Nsight. Jeff Kiel

NVIDIA RESEARCH TALK: THE MAGIC BEHIND GAMEWORKS HYBRID FRUSTUM TRACED SHADOWS

CS451Real-time Rendering Pipeline

MSAA- Based Coarse Shading

AGGREGATE G-BUFFER ANTI-ALIASING

Graphics Processing Unit Architecture (GPU Arch)

CS427 Multicore Architecture and Parallel Computing

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Octree-Based Sparse Voxelization for Real-Time Global Illumination. Cyril Crassin NVIDIA Research

Voxel Cone Tracing and Sparse Voxel Octree for Real-time Global Illumination. Cyril Crassin NVIDIA Research

Spring 2009 Prof. Hyesoon Kim

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Acknowledgement: Images and many slides from presentations by Mark J. Kilgard and other Nvidia folks, from slides on developer.nvidia.

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer

OpenGL Status - November 2013 G-Truc Creation

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Real-Time Volumetric Smoke using D3D10. Sarah Tariq and Ignacio Llamas NVIDIA Developer Technology

Course Recap + 3D Graphics on Mobile GPUs

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Evolution of GPUs Chris Seitz

ADAPTIVE TEMPORAL ANTIALIASING

Working with Metal Overview

The Application Stage. The Game Loop, Resource Management and Renderer Design

Spring 2011 Prof. Hyesoon Kim

Programming Graphics Hardware

Optimisation. CS7GV3 Real-time Rendering

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow. Takahiro Harada, AMD 2017/3

Programming Tips For Scalable Graphics Performance

Rasterization Overview

Tiled shading: light culling reaching the speed of light. Dmitry Zhdan Developer Technology Engineer, NVIDIA

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Streaming Massive Environments From Zero to 200MPH

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

GPU Memory Model Overview

Order Independent Transparency with Dual Depth Peeling. Louis Bavoil, Kevin Myers

EECS 487: Interactive Computer Graphics

8/5/2012. Introduction. Transparency. Anti-Aliasing. Applications. Conclusions. Introduction

CS GAME PROGRAMMING Question bank

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Ray Tracing. Computer Graphics CMU /15-662, Fall 2016

Today. Rendering algorithms. Rendering algorithms. Images. Images. Rendering Algorithms. Course overview Organization Introduction to ray tracing

Rendering Objects. Need to transform all geometry then

Real-Time Voxelization for Global Illumination

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Game Technology. Lecture Physically Based Rendering. Dipl-Inform. Robert Konrad Polona Caserman, M.Sc.

GCN Performance Tweets AMD Developer Relations

Technical Guide. Updated August 24, Page 1 of 19

NVSG NVIDIA Scene Graph

Hello, Thanks for the introduction

INFOMAGR Advanced Graphics. Jacco Bikker - February April Welcome!

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

Shrinath Shanbhag Senior Software Engineer Microsoft Corporation

global light baking software

High-Quality Surface Splatting on Today s GPUs

The Rasterization Pipeline

FRUSTUM-TRACED RASTER SHADOWS: REVISITING IRREGULAR Z-BUFFERS

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images

Practical Techniques for Ray Tracing in Games. Gareth Morgan (Imagination Technologies) Aras Pranckevičius (Unity Technologies) March, 2014

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Enabling immersive gaming experiences Intro to Ray Tracing

Applications of Explicit Early-Z Culling

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

Real-Time Graphics Architecture. Kurt Akeley Pat Hanrahan. Ray Tracing.

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

CIS 581 Interactive Computer Graphics

Sparse Fluid Simulation in DirectX. Alex Dunn Dev. Tech. NVIDIA

Real Time Rendering of Complex Height Maps Walking an infinite realistic landscape By: Jeffrey Riaboy Written 9/7/03

Achieving High-performance Graphics on Mobile With the Vulkan API

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Interactive Light Mapping with PowerVR Ray Tracing

Graphics Hardware. Instructor Stephen J. Guy

MASSIVE TIME-LAPSE POINT CLOUD RENDERING with VR

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

PowerVR Series5. Architecture Guide for Developers

Beginning Direct3D Game Programming: 1. The History of Direct3D Graphics

Render-To-Texture Caching. D. Sim Dietrich Jr.

CS452/552; EE465/505. Clipping & Scan Conversion

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Ultimate Graphics Performance for DirectX 10 Hardware

Overview. Web Copy. NVIDIA Quadro M4000 Extreme Performance in a Single-Slot Form Factor

Today. Rendering algorithms. Rendering algorithms. Images. Images. Rendering Algorithms. Course overview Organization Introduction to ray tracing

The Traditional Graphics Pipeline

NVIDIA DESIGNWORKS Ankit Patel - Prerna Dogra -

Transcription:

New GPU Features of NVIDIA s Maxwell Architecture Holger Gruen Senior DevTech Engineer

AGENDA 9:30 am 10:30 am 11:00 am 12:00 am 12:30 am 13:30 pm 14:00 pm 15:00 pm 15:30 pm 16:30 pm 17:00 pm 18:00 pm Holger Gruen, New GPU Features of NVIDIA s Maxwell Architecture Iain Cantlay, NVIDIA SLI and stutter avoidance: a recipe for smooth gaming and perfect scaling with multiple GPUs Andrei Tatarinov,Tim Tcheblokov, Far Cry 4, Assassin's Creed Unity and War Thunder: Spicing up PC graphics with GameWorks Nathan Reed, VR Direct: How NVIDIA Technology Is Improving The VR Experience Alexey Panteleev, VXGI: dynamic global illumination for games Jeffrey Kiel, Russ Kerschner, Brighter and faster graphics with NVIDIA Nsight Visual Studio Edition 4.5 and beyond

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Direct3D 12 Latest high-performance graphics API Low-level model, even more direct Works across all Microsoft Platforms Supported by excellent tools Supports top PC hardware vendors

DirectX 12 Features New API is parallelizable for rendering on multicore CPUs Reduced API overhead for single-core work More nimble resource binding model using indexing More efficient data management/transfer model More explicit work scheduling model New Hardware Features Demos in station at the Microsoft Booth of API and tools

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Maxwell and its architectural Goals New architecture for improved effiency still on a 28nm process Massively improved perf / watt Focus on new graphics features Real-time GI for rich dynamic scenes Higher quality, programmable AA Working set management Scalable 2D graphics acceleration Create the best platform for DirectX 12

Maxwell and its architectural Goals New architecture for improved effiency still on a 28nm process Massively improved perf / watt Focus on new graphics features Real-time GI for rich dynamic scenes Higher quality, programmable AA Working set management Scalable 2D graphics acceleration Create the best platform for DirectX 12

Maxwell and its architectural Goals New architecture for improved effiency still on a 28nm process Massively improved perf / watt Focus on new graphics features Real-time GI for rich dynamic scenes Higher quality, programmable AA Working set management Scalable 2D graphics acceleration Create the best platform for DirectX 12

Maxwell and its architectural Goals New architecture for improved effiency still on a 28nm process Massively improved perf / watt Focus on new graphics features Real-time GI for rich dynamic scenes Higher quality, programmable AA Working set management Scalable 2D graphics acceleration Create the best platform for DirectX 12

Maxwell and its architectural Goals New architecture for improved effiency still on a 28nm process Massively improved perf / watt Focus on new graphics features Real-time GI for rich dynamic scenes Higher quality, programmable AA Working set management Scalable 2D graphics acceleration Create the best platform for DirectX 12

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Standard Rasterization Limitations Rasterization can t easily create data-structures Drops sub-pixel triangles Data-structures used in later compute passes E.g. Ray-Tracing

Standard Rasterization Limitations Rasterization can t easily create data-structures Drops sub-pixel triangles Data-structures used in later compute passes E.g. Ray-Tracing

Conservative Rasterization Draws all pixels a triangle touches Different Tiers see DX spec Possible before through GS trick but relatively slow See J. Hasselgren et. Al, Conservative Rasterization, GPU Gems 2 Now we can use rasterization do implement some nice techniques!

Conservative Rasterization Usecases C. Wymann et. al, Frustum-Traced Raster Shadows: Revisiting Irregular Z-Buffers, I3D 2015 J. Story Hybrid Ray-Traced Shadows, D3D Day GDC 2015

Conservative Rasterization Usecases C. Wymann et. al, Frustum-Traced Raster Shadows: Revisiting Irregular Z-Buffers, I3D 2015 J. Story Hybrid Ray-Traced Shadows, D3D Day GDC 2015

Hybrid Raytraced Shadows using a NxNxd primitive map See J. Story, Hybrid Ray-Traced Shadows, D3D Day GDC 2015 Prim Buffer Triangle vertices Prim Count Map NxN Prim Indices Map Prim buffer indices of triangles Prim Count Map # of tris per texel Raytrace triangles in a later pass Prim Indices Map NxNxd Prim Buffer

Raytraced Shadows Demo

Shadow Map SM = 8K x 8K (256 MB)

Ray Traced SM = 3K x 3K (36 MB) PM = 1K x 1K x 64 ( 256 MB )

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

UAV Race Condition Issue Pixel shader writes to UAVs are unordered Can t guarantee determinism Can t do a number of things Programmable Blending Smart OIT implementations Arbitray g-buffer data packing Other per-pixel-data structures

Raster Order Views (ROVs) ROVs guarantee ordering Ordering doesn t come for free Depth complexity affects performance Always compare with alternative implementations Advanced Blending Ops Atomics

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

DX12 Tiled Resources Full support for tiled 3D Textures/Arrays On top of what DX11.2 provides Recap Tiled Resources Enable fine grained working set management Texture defined as set of tiles Memory for tiles allocated separately

Tiled Resources applications Fine-grained working set management Texture streaming/ Clipmaps Variable resolution resources Adaptive shadow maps Sparse multi-resolution rendering Sparse representation Voxel grids Simulation physics, path finding

Tiled Resources applications Fine-grained working set management Texture streaming/ Clipmaps Variable resolution resources Adaptive shadow maps Sparse multi-resolution rendering Sparse representation Voxel grids Simulation physics, path finding

Tiled Resources applications Fine-grained working set management Texture streaming/ Clipmaps Variable resolution resources Adaptive shadow maps Sparse multi-resolution rendering Sparse representation Voxel grids Simulation physics, path finding

Tiled Resources applications Fine-grained working set management Texture streaming/ Clipmaps Variable resolution resources Adaptive shadow maps Sparse multi-resolution rendering Sparse representation Voxel grids Simulation physics, path finding

Tiled Resources applications Fine-grained working set management Texture streaming/ Clipmaps Variable resolution resources Adaptive shadow maps Sparse multi-resolution rendering Sparse representation Voxel grids Simulation physics, path finding

Sparse Fluid Simulation Uses tiled resources to only simulate/store grid cells that contain fluid Save computation time and memory See Alex Dunn, Sparse Fluid Simulation in DirectX, D3D Day GDC 2015

Sparse Fluid Demo

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Geometry Shader Challenges Significant overhead even for pass-through cases Significant overhead for viewport selection Significant amplification overhead for multiple viewports

Multi-Projection Acceleration Fast Geometry shader pass-through Fast viewport multi-casting Maxwell accelerates: Voxelization Cube-Map Rendering Cascaded shadow maps Multi-resolution rendering

Multi-Projection Acceleration Fast Geometry shader pass-through Fast viewport multi-casting Maxwell accelerates: Voxelization Cube-Map Rendering Cascaded shadow maps Multi-resolution rendering

Multi-Projection Acceleration Fast Geometry shader pass-through Fast viewport multi-casting Maxwell accelerates: Voxelization Cube-Map Rendering Cascaded shadow maps Multi-resolution rendering

Multi-Projection Acceleration Fast Geometry shader pass-through Fast viewport multi-casting Maxwell accelerates: Voxelization Cube-Map Rendering Cascaded shadow maps Multi-resolution rendering

Voxel Based GI - VXGI Uses multi-projection for fast voxelization See Alexey Panteleev s talk later today, VXGI: dynamic global illumination for games

VXGI Result VXGI Reference rendering with NVIDIA Iray

VXGI demo

Multi-Projection API Support OpenGL+Android: NV_geometry_shader_passthrough extension for GS pass-through NV_viewport_array2 extension for viewport multicast The extension specs have good shader examples DX11/DX12: No explicit API publicly available yet stay tuned

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Quick Multisampling Recap

Target-independent Multisampling Decouples visibility rate from color sample rate Allows lower color buffer storage cost for custom AA techniques Introduces coverage reduction stage

Post-depth Coverage Pre-Maxwell : Coverage Mask delivered is pre-depth-test coverage No way to get at the post-depth-test coverage Maxwell can deliver post-depth-coverage to the pixel shader

Multisample Coverage Override Pre-Maxwell : Shader can only reduce coverage sample set Maxwell can fully override raster-coverage mask

Aggregate G-Buffer AA demo C. Crassin et. al, Aggregate G-Buffer Anti-Aliasing, ID3D 2015 Uses post depth coverage to only process visible sub-samples Uses coverage override to route to right sub-sample cluster Other work using Maxwell AA features: E. Enderton et. al, Accumulative Anti- Aliasing, to appear

MSAA Coverage to Color Conversion

Programmable Sample Locations Sample locations fully programmable Foundation for Multi Frame sampled AA Interleaved sample positions 16x sample locations can be tiled to a set of pixels

Programmable Sample Locations Sample locations fully programmable Foundation for Multi Frame sampled AA Interleaved sample positions 16x sample locations can be tiled to a set of pixels

Programmable Sample Locations Sample locations fully programmable Foundation for Multi Frame sampled AA Interleaved sample positions 16x sample locations can be tiled to a set of pixels

API support Antialiasing Features DirectX 12 Target-independent multipsampling OpenGL+ Android: Target-independent multisampling control: NV_framebuffer_mixed_samples EXT_raster_multisample Coverage to color conversion: NV_fragment_coverage_to_color Post-depth coverage : EXT_post_depth_coverage Multisample coverage override : NV_sample_mask_override_coverage Programmable sample locations : NV_sample_locations NvAPI: Coming soon

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

Screen Space BBox Rasterization Screen Space Bbox rasterization Reduce # of vertices sent to GPU Speeds up particle systems, point sprite etc. Supported by these APIs: OpenGL: NV_fill_rectangle extension NvAPI: coming soon

Min/Max texture filtering Hardware support for min/max filtering Usecases: Min-Max shadow maps Other min-max reduction chains API support: OpenGL: EXT_texture_filter_minmax DirectX11.2 0 5 3 2 MAX returns 5 0 5 3 2 MIN returns 0

New Interlocked Operations 1 2D-vector: two 16bit floating point numbers 16bit fp 16bit fp 4D-vector: four 16bit floating point numbers 16bit fp 16bit fp 16bit fp 16bit fp

New Interlocked Operations 2 Usecases: Reduce the number of Interlocked ops during e.g. light accumulation Save memory if you only need 16bit values API support OpenGL + Android: NV_shader_atomic_fp16_vector NvAPI: coming soon

Extended Blend Modes ZERO SRC DST SRC_OVER DST_OVER SRC_IN DST_IN SRC_OUT DST_OUT SRC_ATOP DST_ATOP XOR PLUS PLUS_CLAMPED PLUS_CLAMPED_ALPHA MULTIPLY SCREEN OVERLAY DARKEN LIGHTEN COLORDODGE COLORBURN HARDLIGHT SOFTLIGHT SOFTLIGHT_SVG DIFFERENCE MINUS MINUS_CLAMPED EXCLUSION CONTRAST INVERT INVERT_RGB INVERT_KHR LINEARDODGE LINEARBURN VIVIDLIGHT LINEARLIGHT PINLIGHT HARDMIX RED GREEN BLUE HSL_HUE HSL_SATURATION HSL_COLOR HSL_LUMINOSITY

Outline of this talk Architectural goals of Maxwell DirectX12 hardware features Conservative Rasterization Raster Order Views Tiled Resources Multi-Projection Acceleration New Antialiasing Features Misc other new features Questions and Answers

GameWorks Get the latest information for developers from NVIDIA and continue the discussion gameworks.nvidia.com