Programming Tips For Scalable Graphics Performance

Size: px
Start display at page:

Download "Programming Tips For Scalable Graphics Performance"

Transcription

1 Game Developers Conference 2009 Programming Tips For Scalable Graphics Performance March 25, 2009 ROOM 2010 Luis Gimenez Graphics Architect Ganesh Kumar Application Engineer Katen Shah Graphics Architect Agenda Why Optimize for Scalable Graphics Intel GMA Series Architecture and Tools Balance Work Load Between and GPU Minimize Runtime and Driver Overhead Optimize Shader Performance Case Study Q&A 2 1

2 Developing for Integrated Graphics Allows You to Sell Your Game to More Customers! PC Graphics Market Segment Millions Desktop Integrated Desktop Discrete Mobile Integrated Mobile Discrete Source: Mercury Research (Q4 08) 3 Scale Your Game! 4 2

3 Intel Integrated Graphics (IIG) Architecture Memory Commands Internal buses Cmd Streamer Video Processing 2D Display Memory /Cache VF VS GS Clip Setup Rast / Early-Z SO Thread Dispatch I$ Cache EU 0 EU 1 EU n EU 0 EU 1 EU n Array of Execution Units Row0 RowN Sampler Texture Cache Render Cache Pixel Ops Intel GMA 3 & GMA 4 Series support SM4 5 Intel s New Graphics Performance Analyzers Today 2:30 PM 3:30 PM in Room 3004, West Hall SYSTEM ANALYZER FRAME ANALYZER 6 3

4 Optimization Hints For Intel Integrated Graphics How to avoid frequent pitfalls found in testing integrated graphics playability over numerous games every year Balance Workload Between and GPU Minimize Runtime and Driver Overhead Optimize Shader Performance 7 Balance The Workload between the and the GPU OCEAN FOG DEMO Complex Algorithms Physics/AI Simulation Animation Pre-computing Massive Data Parallelism Per Pixel Lighting Shadows Post Processing Blending Animation Pre-computing the Perlin textures in the and using the GPU for Rendering nearly doubled the frame rate 8 4

5 Maximize and GPU Utilization: Avoid Stalling the Pipeline! 2. Map() Resource Copy output Staging Resource 3. Stall Until Flush To avoid stalling the minimize data read-back Serializing Event Queries GPU CMD Buffer 1.CopyResource Render Command Command 9 Maximize and GPU Utilization: Avoid Stalling the Pipeline! STUTTERING F0 F1 F2 F3 F4 F5 F0 GPU GPU GPU GPU F0 F1 F2 F3 STALL F1 GPU F0 STALL GPU F1 F2 GPU F4 GPU F2 To avoid stalling the minimize data read-back Serializing Event Queries Put Space between locks Synchronize to N-1 to N-2 frames F0 F1 F2 F3 N-2 SYNCH GPU F0 GPU F1 GPU F2 10 5

6 Maximize and GPU Utilization: Avoid Stalling the Pipeline! The IIG driver optimizes the workload before sending it to the GPU Memory App Direct3D Intel Driver Commands Vertex Buffers Index Buffer Texture Texture Buffer Texture Depth / Color Display Buffer Cmd Parser Vertex Shader Geometry Shader Stream Out Clipper Setup/ Rasterization Pixel Shader Output Merger To avoid stalling the minimize data read-back Serializing Event Queries Put Space between locks Synchronize to N-1 to N-2 frames Reduce work, optimize Driver performance by reducing State Changes Creation and Destruction of Resources 11 Optimization Hints For Intel Integrated Graphics Balance load Between and GPU Minimize Runtime and Driver Overhead Optimize Shader Performance 12 6

7 Minimizing Runtime and Driver Overhead Manage Your DirectX 10 Resources! DirectX 10 manages resources based on USAGE and _ACCESS_FLAG The best memory location is decided by OS/driver/memory manager DX10 Usage / Update Freq NON MAPPABLE MAPPABLE IMMUTABLE Never DEFAULT <=1 per frame DYNAMIC > 1 per Frame STAGING transfer data to the GPU transfer data to the GPU Access Resource Update USE GPU read GPU readwrite write GPU read Copy() readwrite GPU indirect read/ write Read-back from GPU Create () create never updated Copy (), Update () use only for CBs and small textures Map() w. WRITE_NO_OVERWRITE partial update of VBs/IBs WRITE_DISCARD for full update or CBs Copy () Map() for write to mapped memory WRITE/DO_NOT_WAIT_FLAG to avoid stalls Copy () from staging resource to video Memory Copy() GPU output to staging resource Map() for read w. DO_NOT_WAIT_FLAG to avoid stall Static VBs/ IBs/Textures VBs/IBs/CBs /Textures Dynamic Update VBs/ IBs CBs Texture updates Surfaces for read-back / Minimizing Runtime and Driver Overhead Optimize Your Constants Access! IIG Driver optimizes for DX9/10 the most frequently used constants Avoid global constants Limit Dynamicindexed Constants C[a0] C[r] Fog Demo In DX10 when a constant changes the complete buffer gets updated Group cbuffers by frequency of updates Organize cbuffers based on feature scaling Inside cbuffer put constants by access sequence Inside cbuffers pack data into float4 boundaries

8 Minimizing Runtime and Driver Overhead Batch Your Primitives! Use large batches >200-1K primitives Minimize State Changes between batches Use Instancing for Small Batches 15 Optimization Hints For Intel Integrated Graphics Balance load Between and GPU Minimize Runtime and Driver Overhead Optimize Shader Performance 16 8

9 Optimizing Shader Performance Skip Computes that do not Render! Test for visibility to reject objects that fall outside the view frustum Maximize Use of Early-Z (cost 4 pixels/clock hardware) Avoid modified Z value (odepth) in the pixel shader Use Occlusion Query for complex scenes Use LOD to reduce complexity for objects that are distant 17 Optimizing Shader Performance Optimize the Use of the Intel Integrated Graphics HW! Cmd Streamer VF VS GS Clip Setup Rast / Early-Z SO Thread Dispatch I$ Cache Array of Execution Units EU 0 EU 1 EU Row0 n EU 0 EU 1 EU n RowN Sampler Texture Cache Render Cache Pixel Ops For best EUs Utilization minimize registry usage Sample Textures to >4:1 ratio of #Instructions per Texture Sample Large shader impacts performance due to limited number of registers Smart Usage of Flow Control Mask alpha when not needed Minimize use of transcendentals like LOG, POW, EXP etc. Pre-load Shaders to avoid Mid-Scene Compiles Avoid Mid-Scene textures changes 18 9

10 Optimizing Shader Performance Scale Your Pixel Shader and Textures! Keep your Textures under 256x256 and same format if possible Prefer Multi-texture texture over Multi-Pass Use Compressed Textures and mip-maps Use Texture arrays / Texture Atlas Minimize Lock/Blit of Z and/or Stencil Buffer Use Shadow Maps for IIG and Stencil Shadows as scalable feature Minimize Clear() surfaces Minimize post processing passes 19 Optimizing for IIG: Demigod 20 10

11 Key Lessons Learned from Optimizing Demigod for IIG 21 Be Wary of Clear Calls Why: - Costlier than you might think - Affects every ypixel on surface Recommendations: - Make sure unused surfaces don t get cleared unnecessarily - Consider reducing surface resolution when in lower LOD - Clear Color, Stencil and Z-Buffer in the same API call 22 11

12 Prune Costly Clear Calls 23 Reduce the Number of Texture Fetches Texture cache is limited on integrated graphics Reducing Texture sizes alone doesn t help as much Optimize Shaders by reducing texture fetches in Low Fidelity modes Balance Texture load instructions with arithmetic instructions if possible 24 12

13 Simplify Post Processing Effects Post Processing Effects that use multiple passes Bloom Motion Blur Depth of Field High Dynamic Range Balance visual quality with speed by reducing the number of passes 25 Demigod Bloom Effect Before After Bloom turned Off Bloom On with Fewer Passes 26 13

14 Avoid Pixel Overdraw Render opaque objects from Front to Back - Render UI and other HUDs first - Render Sky and Terrain last Early-Z architecture eliminates occluded pixels early in the pipeline 27 Example of Back to Front Rendering 28 14

15 Moving Terrain Rendering to the End 29 Lastly, Add Benchmark Mode to Your Game for Performance Profiling! It helps to characterize the workload Four Key requirements benchmark must provide 1. Accurately reflect real workload 2. Repeatability 3. Ability to run standalone without Internet 4. Ability to Automate t built-in i demo, command-line execution and output to a log file 30 15

16 Summary Scale Your Game for Integrated! Balance and GPU Workload, Avoid Stalls Minimize Run Time and Driver Overhead Optimize your shader performance by scaling your game Analyze your game, find your most expensive call Balance your visual effects against performance penalties Add benchmark mode to your game 31 Additional Resources Developers Guide for Intel Integrated Graphics Articles Mentioned in this Presentation com/en-us/articles/ocean-fog-using-direct3d-10 using Intel Graphics Performance Analyzer Intel Graphics Community Integrated Graphics Software Development Forum US/forums/2414/ShowForum.aspx Intel Laptop Gaming TDK

17 Enhance Your Products and Your Business Training the Next Generation The gateway to Intel s worldwide technology, engineering and go-to-market support for Visual Computing developers Get the Story Behind the Story Investing in Talent and Technology See What s New Developers Connecting with Intel Engineers 33 For More Information Contact info See Intel at GDC: - Intel Booth at Expo, North Hall - Intel Interactive Lounge West Hall 3 rd floor Take a collateral DVD - Here in the room! - Intel Booth or Interactive Lounge 34 17

18 GDC Wednesday, March 25 Programming Tips for Scalable Graphics 10:30 AM 11:30 AM in Room 2010, West Hall Threaded AI For the Win! 12:00 PM 1:00 PM in Room 2011, West Hall Intel s New Graphics Performance Analyzers 2:30 PM 3:30 PM in Room 3004, West Hall Kaboom: Real-Time Multi-Threaded Fluid Simulation for Games 4:00 PM 5:00 PM in Room 2011, West Hall Thursday, March 26 Who Moved the Goalposts? The Rapidly Changing World of s and Optimization 1:30 PM 2:30 PM in Room 2011, West Hall Taming Your Game Production Demons: the Offset approach 3:00 PM 4:00 PM in Room 2011, West Hall Optimizing Game Architectures with Intel Threading Building Blocks 4:30 PM 5:30 PM in Room 2011, West Hall 35 Last of GDC Friday, March 27 Procedural and Multi-Core Techniques to take Visuals to the Next Level 9:00 AM 10:00 AM in Room 2010, West Hall Rasterization on Larrabee: A First Look at the Larrabee New Instructions (LRBni) in Action 9:00 AM 10:00 AM in Room 135, North Hall SIMD Programming on Larrabee: A Second Look at the Larrabee New Instructions (LRBni) in Action 10:30 AM 11:30 AM in Room 3002, West Hall 36 18

19 Risk Factors This presentation contains forward-looking statements. All statements made that are not historical facts are subject to a number of risks and uncertainties, and actual results may differ materially. Please refer to our most recent Earnings Release and our most recent Form 10-Q or 10-K filing available on our website for more information i on the risk factors that could cause actual results to differ. Rev. 4/17/07 37 Backup Slides 39 19

20 Both Intel GMA 3 and 4 support DirectX 10 Make your Scaling API Independent! Game Scaling DX8 DX9 DX10 High Detail Standard Detail Low Detail Recommend dation 40 Both Intel GMA 3 and 4 support all required D3D10 Features D3D10 Optional Features - MSAA: only single sample supported - 32-bit FP Filtering: not supported - 16bit UNORM Blending: Supported in GMA X4XXX and beyond - RGB32 RT: Not supported - Use D3D10Device::CheckFormatSupport to check for supported formats Other D3D10 performance considerations Limit Use of GS make it scale feature Use different Stream Out buffers for different SO formats Check for Optional Features before Use them 41 20

21 21

Ultimate Graphics Performance for DirectX 10 Hardware

Ultimate Graphics Performance for DirectX 10 Hardware Ultimate Graphics Performance for DirectX 10 Hardware Nicolas Thibieroz European Developer Relations AMD Graphics Products Group nicolas.thibieroz@amd.com V1.01 Generic API Usage DX10 designed for performance

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský Real - Time Rendering Pipeline optimization Michal Červeňanský Juraj Starinský Motivation Resolution 1600x1200, at 60 fps Hw power not enough Acceleration is still necessary 3.3.2010 2 Overview Application

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming

More information

GCN Performance Tweets AMD Developer Relations

GCN Performance Tweets AMD Developer Relations AMD Developer Relations Overview This document lists all GCN ( Graphics Core Next ) performance tweets that were released on Twitter during the first few months of 2013. Each performance tweet in this

More information

Rendering Grass with Instancing in DirectX* 10

Rendering Grass with Instancing in DirectX* 10 Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article

More information

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON Deferred Rendering in Killzone 2 Michal Valient Senior Programmer, Guerrilla Talk Outline Forward & Deferred Rendering Overview G-Buffer Layout Shader Creation Deferred Rendering in Detail Rendering Passes

More information

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp Next-Generation Graphics on Larrabee Tim Foley Intel Corp Motivation The killer app for GPGPU is graphics We ve seen Abstract models for parallel programming How those models map efficiently to Larrabee

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg The Ultimate Developers Toolkit Jonathan Zarge Dan Ginsburg February 20, 2008 Agenda GPU PerfStudio GPU ShaderAnalyzer RenderMonkey Additional Tools Tootle GPU MeshMapper CubeMapGen The Compressonator

More information

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies How to Work on Next Gen Effects Now: Bridging DX10 and DX9 Guennadi Riguer ATI Technologies Overview New pipeline and new cool things Simulating some DX10 features in DX9 Experimental techniques Why This

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Building scalable 3D applications. Ville Miettinen Hybrid Graphics Building scalable 3D applications Ville Miettinen Hybrid Graphics What s going to happen... (1/2) Mass market: 3D apps will become a huge success on low-end and mid-tier cell phones Retro-gaming New game

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research Applications of Explicit Early-Z Z Culling Jason Mitchell ATI Research Outline Architecture Hardware depth culling Applications Volume Ray Casting Skin Shading Fluid Flow Deferred Shading Early-Z In past

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

The Rasterization Pipeline

The Rasterization Pipeline Lecture 5: The Rasterization Pipeline (and its implementation on GPUs) Computer Graphics CMU 15-462/15-662, Fall 2015 What you know how to do (at this point in the course) y y z x (w, h) z x Position objects

More information

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016 Vulkan on Mobile Daniele Di Donato, ARM GDC 2016 Outline Vulkan main features Mapping Vulkan Key features to ARM CPUs Mapping Vulkan Key features to ARM Mali GPUs 4 Vulkan Good match for mobile and tiling

More information

PowerVR Performance Recommendations. The Golden Rules

PowerVR Performance Recommendations. The Golden Rules PowerVR Performance Recommendations Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind. Redistribution

More information

Achieving High-performance Graphics on Mobile With the Vulkan API

Achieving High-performance Graphics on Mobile With the Vulkan API Achieving High-performance Graphics on Mobile With the Vulkan API Marius Bjørge Graphics Research Engineer GDC 2016 Agenda Overview Command Buffers Synchronization Memory Shaders and Pipelines Descriptor

More information

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Save the Nanosecond! PC Graphics Performance for the next 3 years. Richard Huddy European Developer Relations Manager ATI Technologies, Inc.

Save the Nanosecond! PC Graphics Performance for the next 3 years. Richard Huddy European Developer Relations Manager ATI Technologies, Inc. Save the Nanosecond! PC Graphics Performance for the next 3 years Richard Huddy European Developer Relations Manager ATI Technologies, Inc. A funny thing happened to me ATI is now broadly recognised and

More information

Render-To-Texture Caching. D. Sim Dietrich Jr.

Render-To-Texture Caching. D. Sim Dietrich Jr. Render-To-Texture Caching D. Sim Dietrich Jr. What is Render-To-Texture Caching? Pixel shaders are becoming more complex and expensive Per-pixel shadows Dynamic Normal Maps Bullet holes Water simulation

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

Hardware-driven Visibility Culling Jeong Hyun Kim

Hardware-driven Visibility Culling Jeong Hyun Kim Hardware-driven Visibility Culling Jeong Hyun Kim KAIST (Korea Advanced Institute of Science and Technology) Contents Introduction Background Clipping Culling Z-max (Z-min) Filter Programmable culling

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Today Finishing up from last time Brief discussion of graphics workload metrics

More information

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited

More information

Direct3D 11 Performance Tips & Tricks

Direct3D 11 Performance Tips & Tricks Direct3D 11 Performance Tips & Tricks Holger Gruen Cem Cebenoyan AMD ISV Relations NVIDIA ISV Relations Agenda Introduction Shader Model 5 Resources and Resource Views Multithreading Miscellaneous Q&A

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for

More information

Introduction to the Direct3D 11 Graphics Pipeline

Introduction to the Direct3D 11 Graphics Pipeline Introduction to the Direct3D 11 Graphics Pipeline Kevin Gee - XNA Developer Connection Microsoft Corporation 2008 NVIDIA Corporation. Direct3D 11 focuses on Key Takeaways Increasing scalability, Improving

More information

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil

Real-Time Hair Simulation and Rendering on the GPU. Louis Bavoil Real-Time Hair Simulation and Rendering on the GPU Sarah Tariq Louis Bavoil Results 166 simulated strands 0.99 Million triangles Stationary: 64 fps Moving: 41 fps 8800GTX, 1920x1200, 8XMSAA Results 166

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

GPU Computation Strategies & Tricks. Ian Buck NVIDIA GPU Computation Strategies & Tricks Ian Buck NVIDIA Recent Trends 2 Compute is Cheap parallelism to keep 100s of ALUs per chip busy shading is highly parallel millions of fragments per frame 0.5mm 64-bit

More information

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer

Introducing Metal 2. Graphics and Games #WWDC17. Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer Session Graphics and Games #WWDC17 Introducing Metal 2 601 Michal Valient, GPU Software Engineer Richard Schreyer, GPU Software Engineer 2017 Apple Inc. All rights reserved. Redistribution or public display

More information

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high

More information

Streaming Massive Environments From Zero to 200MPH

Streaming Massive Environments From Zero to 200MPH FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I

More information

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME) Pavel Petroshenko, Sun Microsystems, Inc. Ashmi Bhanushali, NVIDIA Corporation Jerry Evans, Sun Microsystems, Inc. Nandini

More information

Achieving Console Quality Games on Mobile

Achieving Console Quality Games on Mobile Achieving Console Quality Games on Mobile Peter Harris, Senior Principal Engineer, ARM Unai Landa, CTO, Digital Legends Jon Kirkham, Staff Engineer, ARM GDC 2017 Agenda Premium smartphone in 2017 ARM Cortex

More information

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world

More information

1.2.3 The Graphics Hardware Pipeline

1.2.3 The Graphics Hardware Pipeline Figure 1-3. The Graphics Hardware Pipeline 1.2.3 The Graphics Hardware Pipeline A pipeline is a sequence of stages operating in parallel and in a fixed order. Each stage receives its input from the prior

More information

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment Dominic Filion, Senior Engineer Blizzard Entertainment Rob McNaughton, Lead Technical Artist Blizzard Entertainment Screen-space techniques Deferred rendering Screen-space ambient occlusion Depth of Field

More information

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization Jordi Roca Victor Moya Carlos Gonzalez Vicente Escandell Albert Murciego Agustin Fernandez, Computer Architecture

More information

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation Gaming Technologies Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation Overview Games Yesterday & Today Game Components PC Platform & WGF 2.0 Game Trends Big Challenges

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips

Today s Agenda. DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips Today s Agenda DirectX 9 Features Sim Dietrich, nvidia - Multisample antialising Jason Mitchell, ATI - Shader models and coding tips Optimization for DirectX 9 Graphics Mike Burrows, Microsoft - Performance

More information

Software Occlusion Culling

Software Occlusion Culling Software Occlusion Culling Abstract This article details an algorithm and associated sample code for software occlusion culling which is available for download. The technique divides scene objects into

More information

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs Steve Hughes: Senior Application Engineer - Intel www.intel.com/software/gdc Be Bold. Define the Future of Software.

More information

Increase your FPS with CPU Onload

Increase your FPS with CPU Onload Increase your FPS with CPU Onload Josh Doss and Doug Mcnabb Intel Corporation August 10, 2011 www.intel.com/software/siggraph Introduction When optimizing your game it s all about FPS. It s easy to be

More information

Course Recap + 3D Graphics on Mobile GPUs

Course Recap + 3D Graphics on Mobile GPUs Lecture 18: Course Recap + 3D Graphics on Mobile GPUs Interactive Computer Graphics Q. What is a big concern in mobile computing? A. Power Two reasons to save power Run at higher performance for a fixed

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

DirectX 10 Performance. Per Vognsen

DirectX 10 Performance. Per Vognsen DirectX 10 Performance Per Vognsen Outline General DX10 API usage Designed for performance Batching and Instancing State Management Constant Buffer Management Resource Updates and Management Reading the

More information

A Trip Down The (2011) Rasterization Pipeline

A Trip Down The (2011) Rasterization Pipeline A Trip Down The (2011) Rasterization Pipeline Aaron Lefohn - Intel / University of Washington Mike Houston AMD / Stanford 1 This talk Overview of the real-time rendering pipeline available in ~2011 corresponding

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

Jomar Silva Technical Evangelist

Jomar Silva Technical Evangelist Jomar Silva Technical Evangelist Agenda Introduction Intel Graphics Performance Analyzers: what is it, where do I get it, and how do I use it? Intel GPA with VR What devices can I use Intel GPA with and

More information

Increase your FPS. with CPU Onload Josh Doss. Doug McNabb.

Increase your FPS. with CPU Onload Josh Doss. Doug McNabb. Increase your FPS www.intel.com/software/gdc with CPU Onload Josh Doss Joshua.A.Doss@intel.com Doug McNabb Doug.McNabb@Intel.com 3 Introduction When optimizing your game it s all about FPS. It s easy to

More information

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015 Enhancing Traditional Rasterization Graphics with Ray Tracing March 2015 Introductions James Rumble Developer Technology Engineer Ray Tracing Support Justin DeCell Software Design Engineer Ray Tracing

More information

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of

More information

Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA

Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA Ground Rules No DX9 Need to move fast Big topic in 30 minutes Assuming experienced audience Everything is a tradeoff These are

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Overview Tools for the analysis Finding pipeline bottlenecks Practice identifying the problems Analysis Tools NVPerfHUD Graph

More information

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013 Lecture 9: Deferred Shading Visual Computing Systems The course so far The real-time graphics pipeline abstraction Principle graphics abstractions Algorithms and modern high performance implementations

More information

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High

More information

Performance OpenGL Programming (for whatever reason)

Performance OpenGL Programming (for whatever reason) Performance OpenGL Programming (for whatever reason) Mike Bailey Oregon State University Performance Bottlenecks In general there are four places a graphics system can become bottlenecked: 1. The computer

More information

Vulkan Multipass mobile deferred done right

Vulkan Multipass mobile deferred done right Vulkan Multipass mobile deferred done right Hans-Kristian Arntzen Marius Bjørge Khronos 5 / 25 / 2017 Content What is multipass? What multipass allows... A driver to do versus MRT Developers to do Transient

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011) Lecture 6: Texture Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011) Today: texturing! Texture filtering - Texture access is not just a 2D array lookup ;-) Memory-system implications

More information

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer A New Gaming Experience Made Possible With Processor Graphics Released in early 2011, the 2nd Generation

More information

New GPU Features of NVIDIA s Maxwell Architecture

New GPU Features of NVIDIA s Maxwell Architecture New GPU Features of NVIDIA s Maxwell Architecture Holger Gruen Senior DevTech Engineer AGENDA 9:30 am 10:30 am 11:00 am 12:00 am 12:30 am 13:30 pm 14:00 pm 15:00 pm 15:30 pm 16:30 pm 17:00 pm 18:00 pm

More information

POWERVR MBX & SGX OpenVG Support and Resources

POWERVR MBX & SGX OpenVG Support and Resources POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,

More information

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside CS130 : Computer Graphics Tamar Shinar Computer Science & Engineering UC Riverside Raster Devices and Images Raster Devices Hearn, Baker, Carithers Raster Display Transmissive vs. Emissive Display anode

More information

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

Acknowledgement: Images and many slides from presentations by Mark J. Kilgard and other Nvidia folks, from slides on developer.nvidia.

Acknowledgement: Images and many slides from presentations by Mark J. Kilgard and other Nvidia folks, from slides on developer.nvidia. Shadows Acknowledgement: Images and many slides from presentations by Mark J. Kilgard and other Nvidia folks, from slides on developer.nvidia.com Practical & Robust Stenciled Shadow Volumes for Hardware-Accelerated

More information

The NVIDIA GeForce 8800 GPU

The NVIDIA GeForce 8800 GPU The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline

More information

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside CS230 : Computer Graphics Lecture 4 Tamar Shinar Computer Science & Engineering UC Riverside Shadows Shadows for each pixel do compute viewing ray if ( ray hits an object with t in [0, inf] ) then compute

More information

Tiled shading: light culling reaching the speed of light. Dmitry Zhdan Developer Technology Engineer, NVIDIA

Tiled shading: light culling reaching the speed of light. Dmitry Zhdan Developer Technology Engineer, NVIDIA Tiled shading: light culling reaching the speed of light Dmitry Zhdan Developer Technology Engineer, NVIDIA Agenda Über Goal Classic deferred vs tiled shading How to improve culling in tiled shading? New

More information

GeForce3 OpenGL Performance. John Spitzer

GeForce3 OpenGL Performance. John Spitzer GeForce3 OpenGL Performance John Spitzer GeForce3 OpenGL Performance John Spitzer Manager, OpenGL Applications Engineering jspitzer@nvidia.com Possible Performance Bottlenecks They mirror the OpenGL pipeline

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

GPU Memory Model. Adapted from:

GPU Memory Model. Adapted from: GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University

More information

GPU Architecture. Michael Doggett Department of Computer Science Lund university

GPU Architecture. Michael Doggett Department of Computer Science Lund university GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s

More information

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Lecture 2: Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload) Visual Computing Systems Analyzing a 3D Graphics Workload Where is most of the work done? Memory Vertex

More information

POWERVR MBX. Technology Overview

POWERVR MBX. Technology Overview POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied

More information

Inside VR on Mobile. Sam Martin Graphics Architect GDC 2016

Inside VR on Mobile. Sam Martin Graphics Architect GDC 2016 Inside VR on Mobile Sam Martin Graphics Architect GDC 2016 VR Today Emerging technology Main mobile VR ecosystems Google Cardboard Samsung GearVR In this talk: Latency Multiple views Performance tuning

More information

NVIDIA Parallel Nsight. Jeff Kiel

NVIDIA Parallel Nsight. Jeff Kiel NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more

More information

After the release of Maxwell in September last year, a number of press articles appeared that describe VXGI simply as a technology to improve

After the release of Maxwell in September last year, a number of press articles appeared that describe VXGI simply as a technology to improve After the release of Maxwell in September last year, a number of press articles appeared that describe VXGI simply as a technology to improve lighting in games. While that is certainly true, it doesn t

More information

The Traditional Graphics Pipeline

The Traditional Graphics Pipeline Final Projects Proposals due Thursday 4/8 Proposed project summary At least 3 related papers (read & summarized) Description of series of test cases Timeline & initial task assignment The Traditional Graphics

More information

Anatomy of AMD s TeraScale Graphics Engine

Anatomy of AMD s TeraScale Graphics Engine Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream

More information