Order Matters in Resource Creation

Size: px
Start display at page:

Download "Order Matters in Resource Creation"

Transcription

1 Order Matters in Resource Creation William Damon ATI Research, Inc. Introduction Latencies attributed to loading resources on the fly can seriously impact runtime performance. We typically avoid these hiccups by creating or loading resources ahead of the time we need to use them; but even then the physical locations in which our resources ultimately reside can have a serious impact on overall performance. In this article, we introduce and reinforce some common guidelines to keep in mind when setting up render resources. Depth/Stencil early Aside from the obvious resources that are created with the rendering context (i.e. the backbuffer and the optional depth-stencil surface), the best surfaces to create are additional depth-stencil surfaces and render targets. The APIs generally limit the format and size of depth-stencil surfaces to match those of the co-bound render target surfaces, so the best thing to do is to create one depth-stencil surface for each render target format/size combination for which the application requires such a resource. If the application won t be writing depth or stencil information for a particular render target format/size, then there is no need to create a corresponding depth-stencil surface. Generally, depth-stencil buffers can be shared across corresponding render targets or render passes so there is no need to create a unique depthstencil buffer per render target. In practice, most applications usually only require one, maybe two, depth-stencil buffers in addition to the default one that corresponds to the backbuffer. Create these depth-stencil surfaces in order of importance to the application. The reason this is important is that when depth-stencil buffers are created first (or at least very early) the driver can allocate them in the best location in local video-memory such that the buffers benefit from Hyper-Z technology. Render targets also early As stated above, another best resource to create as early as possible is any additional off-screen render targets the application will require. Sometimes it is not possible to know how many additional render targets will be required or even what format(s) they should take on. In that case, a good approach is to use the best heuristics available to make an educated guess as to what might be needed throughout the current resource pool lifetime (e.g. through a single level in a game). Obviously, the tradeoff here is that the application may end up creating render targets that it never uses, wasting valuable memory. This usage pattern, however, may be indicative of a larger problem, and the application architect(s) might consider revisiting the design. Alternatively, this approach may be the only way to implement an algorithm or solve a particular problem. In that case, the best the driver can ask for is that the application creates its render targets as early as possible.

2 Keep an eye on how many render targets are created and the size and format of those surfaces. A fullscreen render target at 1024x768 using 8-bits per channel consumes roughly 3MB of space. Add multisampling to that along with the corresponding multisampled depth-stencil buffer, and you re up to potentially 24MB for one render target! This is why a clear understanding of which off-screen render target surface formats and sizes will be required and how many render targets must coexist simultaneously is important. Creating these surfaces as early as possible allows the driver to place them in the appropriate memory location before spilling into not-as-optimal locations. Finally on the topic of render targets, the ordering of creation in terms of the formats and sizes used does not make much difference. The application will most probably benefit the most by allocating the most commonly used render targets first, followed by those less used. LVM followed by non-lvm then system memory Okay, now that the depth-stencil surfaces and render targets are allocated, the next best thing to create would be those resources that the application would prefer to have in local video-memory, followed by those that live in non-local video memory, and finally those that reside in system memory. In Direct3D terminology this translates into allocating D3DPOOL_DEFAULT resources followed by D3DPOOL_MANAGED resources. Actually, managed resources aren t loaded into LVM (or even non- LVM) until they are needed so really the application should create default pool resources before managed resources are paged in. The most effective way to ensure this is to create the default pool resources first, or evict managed resources immediately beforehand. Vertex and index buffers are a good thing to allocate at this point, as are textures. If an application is really pressed for memory, then the ordering of textures versus geometry buffers may differ from another application based on usage patterns. Say one application uses a lot of geometry but only a few textures, it may want to make sure all that geometry resides in LVM versus another application that uses less geometry but constantly switches textures and can afford the latency of fetching geometry from non-lvm memory while performing expensive pixel operations. When to go static Sometimes deciding whether a vertex or index buffer should be static or dynamic can be confusing. Adding to the confusion is the fact that index buffers behave slightly differently than vertex buffers. Here we will try to dispel some rumors and provide a bit of information as to how different buffers should be allocated. While this information is Direct3D-centric, the same concepts apply in OpenGL. Before we begin, however, let s get a bit of terminology out of the way. The term static has multiple meanings, so we cannot blindly say that locking a static buffer is bad. Buffers created without D3DUSAGE_DYNAMIC are not necessarily static, either, as far as the driver is concerned (regardless of vendor). Remember to keep this distinction in mind as we wade through the following discussion. Index Buffers Index buffers can be allocated in three memory pools: D3DPOOL_DEFAULT, D3DPOOL_MANAGED, or D3DPOOL_SYSTEM. While system memory pools suffer from a little extra overhead when copying data to the hardware, just about everything else about them does not cause any confusion or problems because everything happens in system memory. Consequently, we focus our discussion here on the default and managed memory locations.

3 The default pool An index buffer created in the default pool has the option of providing various usage flags at creation time: D3DUSAGE_WRITEONLY If not set, the driver will not create an LVM buffer. Instead, the Direct3D runtime will create a system memory copy of the resource to be flushed to the GPU upon first use (of each update). D3DUSAGE_DYNAMIC This flag indicates that the data in the buffer will change frequently. In particular, this flag stipulates that the contents presently being used in rendering will change frequently. Some drivers do not create a video memory surface in this case in favor of allowing the Direct3D runtime to create a system memory copy to which the CPU has direct writecombining access. If D3DUSAGE_WRITEONLY is set without D3DUSAGE_DYNAMIC, current drivers will try to create a LVM buffer. If this fails, then the driver must try to fall back to non-lvm. Now, whenever the application does a lock on a default pool index buffer, and the buffer is in video memory, the driver receives a lock call. A well-written application will use one of D3DLOCK_DISCARD or D3DLOCK_NOOVERWRITE. D3DLOCK_DISCARD indicates to the driver that it is safe to perform index buffer renaming (i.e. allocate or return another internal buffer without stalling). D3DLOCK_NOOVERWRITE signals that the application is not going to overwrite any of the contents already written (i.e. the driver is safe to return a pointer into the index buffer without stalling). In either case, the driver does not have to stall and the application need only write the data that it is updating. Failure to appropriately use these locking flags will cause the driver to stall while the current contents of the index buffer are done rendering. The managed pool An index buffer created in the managed pool cannot be marked with the usage flag D3DUSAGE_DYNAMIC; the Direct3D runtime disallows it. Also, there is no such thing as D3DUSAGE_STATIC at the API level, making life a little more interesting for the driver. When an index buffer is created in the managed pool, the Direct3D runtime creates the resource in system memory. All application locking calls only affect this system memory copy, and all updates happen here. The first time an unlock is made, the Direct3D runtime calls the driver and attempts to create a writeonly buffer in video memory that represents the managed buffer. Different vendors drivers use varying heuristics to determine whether this means allocating space in LVM or non-lvm. The nice thing about lock calls on managed pool resources is that they provide parameters for the offset and the size to lock, making things a bit simpler for the driver. Upon drawing with the index buffer, the runtime presents the driver with some information about how to best transfer the data from the updated system memory host copy into the video memory draw copy. Depending on where the resource ended up residing, various optimized copying mechanism can be invoked.

4 Note that allocating non-d3dusage_dynamic index buffers that exhibit dynamic behavior can sometimes be a win, especially on CrossFire (or similar) configurations. Now, with all this background information in mind, here are three common index buffer usage scenarios and some advice on how to allocate index buffers. An index buffer that only requires updates to areas that haven t already been written and aren t currently used in a draw In this case, the application should use the default pool and manage the locks with the locking flags described above. The dynamic usage flag should not be used. Alternatively, the managed pool may not be a bad option; it requires a bit more CPU work, and even some GPU overhead, but there shouldn t be any hardware stalls. An index buffer that requires updates to areas that have already been written and have been used in a draw call Here the application can go ahead and create the buffer in the default pool with the dynamic usage flag. Locks will likely not be expensive even though the locking flags are invalid for D3DUSAGE_DYNAMIC buffers because locking should happen to system memory buffer. Again, the managed pool isn t a bad alternative, and for cases in which the index buffer is updated once for several draw calls, the managed pool might be a better approach. Your mileage may vary. An index buffer that requires updates to the entire buffer every draw in which it s used Definitely use the default pool and manage the locks with the appropriate locking flags in this case. Do not set the dynamic usage flag, however. The managed pool is not a good option in this scenario. Vertex Buffers Vertex buffer creation and usage generally follows the same guidelines as index buffers, though the drivers and even the hardware may handle things a bit differently internally. Drivers generally try to create all vertex buffers in video memory, and dynamic vertex buffers generally end up in non-lvm for better CPU access. It is strongly recommended to NOT place vertex buffers (static or dynamic) in the system memory pool. The Direct3D runtime essentially behaves the same for vertex and index buffers. Other tidbits Always be sure to use the appropriate flags when creating resources through the API. The flags provide tremendous insight to the driver as to how the resource being created will be used, thus giving it clear direction as to where the best location for that resource will be for optimal performance. Also, avoid creating and destroying resources on-the-fly, per-frame. Resource allocation has tremendous overhead, comparatively, and this behavior can cause fragmentation and other memory-related problems. Occasionally, it makes sense to evict all the managed resources from video memory, like when switching levels or worlds in a game, and Direct3D provides an API for this. Performing this eviction will clean up lots of fragmentation that may have built up throughout the last level, and provide a clean slate for the next one. Lastly, should memory be a point of contention for your application, consider using ATI s plug-in for PIX or a similar tool to understand when and how many resources are being

5 created, and what they are used for. Note that the PIX plug-in can also give you useful information as to how well managed vertex/index buffers and textures are playing. Generally, knowing how video memory is utilized by an application and optimizing resource allocations can go a long way to providing (or at least setting the stage for) the best runtime performance. References The ATI plug-in for PIX: /atipix/index.html Acknowledgements The author of this paper wishes the thank Tim Kelley of ATI Technologies Inc. for his great patience and detailed explanations, and the ATI ISV Engineering and Application Research teams for their comments and contributions.

Optimizing Direct3D for the GeForce 256 Douglas H. Rogers Please send me your comments/questions/suggestions

Optimizing Direct3D for the GeForce 256 Douglas H. Rogers Please send me your comments/questions/suggestions Optimizing Direct3D for the GeForce 256 Douglas H. Rogers Please send me your comments/questions/suggestions drogers@nvidia.com Transform and Lighting (T&L) Acceleration under Direct3D To enable hardware

More information

Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA

Low-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA Ground Rules No DX9 Need to move fast Big topic in 30 minutes Assuming experienced audience Everything is a tradeoff These are

More information

Could you make the XNA functions yourself?

Could you make the XNA functions yourself? 1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

Memory Management: Virtual Memory and Paging CS 111. Operating Systems Peter Reiher

Memory Management: Virtual Memory and Paging CS 111. Operating Systems Peter Reiher Memory Management: Virtual Memory and Paging Operating Systems Peter Reiher Page 1 Outline Paging Swapping and demand paging Virtual memory Page 2 Paging What is paging? What problem does it solve? How

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004

More information

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most

More information

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic

More information

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23 FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most

More information

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Overview Tools for the analysis Finding pipeline bottlenecks Practice identifying the problems Analysis Tools NVPerfHUD Graph

More information

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28 CS 220: Introduction to Parallel Computing Introduction to CUDA Lecture 28 Today s Schedule Project 4 Read-Write Locks Introduction to CUDA 5/2/18 CS 220: Parallel Computing 2 Today s Schedule Project

More information

Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher

Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory Operating Systems Peter Reiher Page 1 Outline Swapping Paging Virtual memory Page 2 Swapping What if we don t have enough

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

PowerVR Performance Recommendations The Golden Rules. October 2015

PowerVR Performance Recommendations The Golden Rules. October 2015 PowerVR Performance Recommendations The Golden Rules October 2015 Paul Ly Developer Technology Engineer, PowerVR Graphics Understanding Your Bottlenecks Based on our experience 3 The Golden Rules 1. The

More information

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI

More information

PowerVR Series5. Architecture Guide for Developers

PowerVR Series5. Architecture Guide for Developers Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

CS 111. Operating Systems Peter Reiher

CS 111. Operating Systems Peter Reiher Operating System Principles: File Systems Operating Systems Peter Reiher Page 1 Outline File systems: Why do we need them? Why are they challenging? Basic elements of file system design Designing file

More information

15 Sharing Main Memory Segmentation and Paging

15 Sharing Main Memory Segmentation and Paging Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

Virtual Memory #2 Feb. 21, 2018

Virtual Memory #2 Feb. 21, 2018 15-410...The mysterious TLB... Virtual Memory #2 Feb. 21, 2018 Dave Eckhardt Brian Railing 1 L16_VM2 Last Time Mapping problem: logical vs. physical addresses Contiguous memory mapping (base, limit) Swapping

More information

Virtual Memory. Chapter 8

Virtual Memory. Chapter 8 Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information

Rendering Grass with Instancing in DirectX* 10

Rendering Grass with Instancing in DirectX* 10 Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article

More information

Resolve your Resolves Jon Story Holger Gruen AMD Graphics Products Group

Resolve your Resolves Jon Story Holger Gruen AMD Graphics Products Group Jon Story Holger Gruen AMD Graphics Products Group jon.story@amd.com holger.gruen@amd.com Introduction Over the last few years it has become common place for PC games to make use of Multi-Sample Anti-Aliasing

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 205 Lecture 23 LAST TIME: VIRTUAL MEMORY! Began to focus on how to virtualize memory! Instead of directly addressing physical memory, introduce a level of

More information

Vulkan (including Vulkan Fast Paths)

Vulkan (including Vulkan Fast Paths) Vulkan (including Vulkan Fast Paths) Łukasz Migas Software Development Engineer WS Graphics Let s talk about OpenGL (a bit) History 1.0-1992 1.3-2001 multitexturing 1.5-2003 vertex buffer object 2.0-2004

More information

Chapter 8. Virtual Memory

Chapter 8. Virtual Memory Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:

More information

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming

More information

Hardware-driven visibility culling

Hardware-driven visibility culling Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount

More information

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for

More information

POWERVR MBX. Technology Overview

POWERVR MBX. Technology Overview POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

Technical Report. SLI Best Practices

Technical Report. SLI Best Practices Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling

More information

Memory Management Virtual Memory

Memory Management Virtual Memory Memory Management Virtual Memory Part of A3 course (by Theo Schouten) Biniam Gebremichael http://www.cs.ru.nl/~biniam/ Office: A6004 April 4 2005 Content Virtual memory Definition Advantage and challenges

More information

Reengineering II. Transforming the System

Reengineering II. Transforming the System Reengineering II Transforming the System Recap: Reverse Engineering We have a detailed impression of the current state We identified the important parts We identified reengineering opportunities We have

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008 Dynamic Storage Allocation CS 44: Operating Systems Spring 2 Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need to grow or shrink. Dynamic

More information

Coding OpenGL ES 3.0 for Better Graphics Quality

Coding OpenGL ES 3.0 for Better Graphics Quality Coding OpenGL ES 3.0 for Better Graphics Quality Part 2 Hugo Osornio Rick Tewell A P R 1 1 t h 2 0 1 4 TM External Use Agenda Exercise 1: Array Structure vs Vertex Buffer Objects vs Vertex Array Objects

More information

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and 1 Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and the light. 2 To visualize this problem, consider the

More information

printf Debugging Examples

printf Debugging Examples Programming Soap Box Developer Tools Tim Purcell NVIDIA Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Chapter 4: Memory Management. Part 1: Mechanisms for Managing Memory

Chapter 4: Memory Management. Part 1: Mechanisms for Managing Memory Chapter 4: Memory Management Part 1: Mechanisms for Managing Memory Memory management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design

More information

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

It s possible to get your  inbox to zero and keep it there, even if you get hundreds of  s a day. It s possible to get your email inbox to zero and keep it there, even if you get hundreds of emails a day. It s not super complicated, though it does take effort and discipline. Many people simply need

More information

Technical Report. SLI Best Practices

Technical Report. SLI Best Practices Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling

More information

Direct3D 11 Performance Tips & Tricks

Direct3D 11 Performance Tips & Tricks Direct3D 11 Performance Tips & Tricks Holger Gruen Cem Cebenoyan AMD ISV Relations NVIDIA ISV Relations Agenda Introduction Shader Model 5 Resources and Resource Views Multithreading Miscellaneous Q&A

More information

Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 32 The Memory Systems Part III Welcome back. (Refer Slide

More information

Beyond Programmable Shading. Scheduling the Graphics Pipeline

Beyond Programmable Shading. Scheduling the Graphics Pipeline Beyond Programmable Shading Scheduling the Graphics Pipeline Jonathan Ragan-Kelley, MIT CSAIL 9 August 2011 Mike s just showed how shaders can use large, coherent batches of work to achieve high throughput.

More information

Cache introduction. April 16, Howard Huang 1

Cache introduction. April 16, Howard Huang 1 Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently

More information

Why modern versions of OpenGL should be used Some useful API commands and extensions

Why modern versions of OpenGL should be used Some useful API commands and extensions Michał Radziszewski Why modern versions of OpenGL should be used Some useful API commands and extensions Timer Query EXT Direct State Access (DSA) Geometry Programs Position in pipeline Rendering wireframe

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

CS510 Operating System Foundations. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole CS510 Operating System Foundations Jonathan Walpole A Solution to the Gaming Parlor Programming Project The Gaming Parlor - Solution Scenario: Front desk with dice (resource units) Groups request (e.g.,

More information

ArcGIS Runtime: Maximizing Performance of Your Apps. Will Jarvis and Ralf Gottschalk

ArcGIS Runtime: Maximizing Performance of Your Apps. Will Jarvis and Ralf Gottschalk ArcGIS Runtime: Maximizing Performance of Your Apps Will Jarvis and Ralf Gottschalk Agenda ArcGIS Runtime Version 100.0 Architecture How do we measure performance? We will use our internal Runtime Core

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance

More information

Up and Running Software The Development Process

Up and Running Software The Development Process Up and Running Software The Development Process Success Determination, Adaptative Processes, and a Baseline Approach About This Document: Thank you for requesting more information about Up and Running

More information

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay! Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1

More information

GPU Memory Model. Adapted from:

GPU Memory Model. Adapted from: GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University

More information

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Mali Developer Resources. Kevin Ho ARM Taiwan FAE Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (8 th Week) (Advanced) Operating Systems 8. Virtual Memory 8. Outline Hardware and Control Structures Operating

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 208 Lecture 23 LAST TIME: VIRTUAL MEMORY Began to focus on how to virtualize memory Instead of directly addressing physical memory, introduce a level of indirection

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:

More information

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility

More information

All Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging

All Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging 3/14/2001 1 All Paging Schemes Depend on Locality VM Page Replacement Emin Gun Sirer Processes tend to reference pages in localized patterns Temporal locality» locations referenced recently likely to be

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position

More information

DRI Memory Management

DRI Memory Management DRI Memory Management Full strength manager wasn't required for traditional usage: Quake3 and glxgears. Perceived to be difficult. Fundamental for modern desktops, offscreen rendering. Talked about for

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

MXwendler Fragment Shader Development Reference Version 1.0

MXwendler Fragment Shader Development Reference Version 1.0 MXwendler Fragment Shader Development Reference Version 1.0 This document describes the MXwendler fragmentshader interface. You will learn how to write shaders using the GLSL language standards and the

More information

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004

More information

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004

More information

Topic 18: Virtual Memory

Topic 18: Virtual Memory Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection

More information

Address spaces and memory management

Address spaces and memory management Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,

More information

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Operating Systems. Overview Virtual memory part 2. Page replacement algorithms. Lecture 7 Memory management 3: Virtual memory

Operating Systems. Overview Virtual memory part 2. Page replacement algorithms. Lecture 7 Memory management 3: Virtual memory Operating Systems Lecture 7 Memory management : Virtual memory Overview Virtual memory part Page replacement algorithms Frame allocation Thrashing Other considerations Memory over-allocation Efficient

More information

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead

More information

Buffer Management for XFS in Linux. William J. Earl SGI

Buffer Management for XFS in Linux. William J. Earl SGI Buffer Management for XFS in Linux William J. Earl SGI XFS Requirements for a Buffer Cache Delayed allocation of disk space for cached writes supports high write performance Delayed allocation main memory

More information

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?

Rasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care? Where does a picture come from? Rasterization and Graphics Hardware CS559 Course Notes Not for Projection November 2007, Mike Gleicher Result: image (raster) Input 2D/3D model of the world Rendering term

More information

Raise your VR game with NVIDIA GeForce Tools

Raise your VR game with NVIDIA GeForce Tools Raise your VR game with NVIDIA GeForce Tools Yan An Graphics Tools QA Manager 1 Introduction & tour of Nsight Analyze a geometry corruption bug VR debugging AGENDA System Analysis Tracing GPU Range Profiling

More information

a process may be swapped in and out of main memory such that it occupies different regions

a process may be swapped in and out of main memory such that it occupies different regions Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically

More information

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Executive Summary The NVIDIA Quadro2 line of workstation graphics solutions is the first of its kind to feature hardware support for

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

Profiling and Debugging Games on Mobile Platforms

Profiling and Debugging Games on Mobile Platforms Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5

More information

The Operating System. Chapter 6

The Operating System. Chapter 6 The Operating System Machine Level Chapter 6 1 Contemporary Multilevel Machines A six-level l computer. The support method for each level is indicated below it.2 Operating System Machine a) Operating System

More information

Memory management, part 2: outline. Operating Systems, 2017, Danny Hendler and Amnon Meisels

Memory management, part 2: outline. Operating Systems, 2017, Danny Hendler and Amnon Meisels Memory management, part 2: outline 1 Page Replacement Algorithms Page fault forces choice o which page must be removed to make room for incoming page? Modified page must first be saved o unmodified just

More information

Inside the PostgreSQL Shared Buffer Cache

Inside the PostgreSQL Shared Buffer Cache Truviso 07/07/2008 About this presentation The master source for these slides is http://www.westnet.com/ gsmith/content/postgresql You can also find a machine-usable version of the source code to the later

More information

Topic 18 (updated): Virtual Memory

Topic 18 (updated): Virtual Memory Topic 18 (updated): Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level

More information

Computergrafik. Matthias Zwicker. Herbst 2010

Computergrafik. Matthias Zwicker. Herbst 2010 Computergrafik Matthias Zwicker Universität Bern Herbst 2010 Today Bump mapping Shadows Shadow mapping Shadow mapping in OpenGL Bump mapping Surface detail is often the result of small perturbations in

More information

Distributed Virtual Reality Computation

Distributed Virtual Reality Computation Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal

More information

Virtual Memory Outline

Virtual Memory Outline Virtual Memory Outline Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples

More information

CS 326: Operating Systems. Process Execution. Lecture 5

CS 326: Operating Systems. Process Execution. Lecture 5 CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation

More information

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo

PAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo PAGE REPLACEMENT Operating Systems 2015 Spring by Euiseong Seo Today s Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced

More information

CS 136: Advanced Architecture. Review of Caches

CS 136: Advanced Architecture. Review of Caches 1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express Level: Intermediate Area: Graphics Programming Summary This document is an introduction to the series of samples,

More information

Streaming Massive Environments From Zero to 200MPH

Streaming Massive Environments From Zero to 200MPH FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I

More information

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis Multi-level Translation CS 57 Lecture 9 Paging Michael Swift Problem: what if you have a sparse address space e.g. out of GB, you use MB spread out need one PTE per page in virtual address space bit AS

More information