Order Matters in Resource Creation
|
|
- Kristian Benson
- 5 years ago
- Views:
Transcription
1 Order Matters in Resource Creation William Damon ATI Research, Inc. Introduction Latencies attributed to loading resources on the fly can seriously impact runtime performance. We typically avoid these hiccups by creating or loading resources ahead of the time we need to use them; but even then the physical locations in which our resources ultimately reside can have a serious impact on overall performance. In this article, we introduce and reinforce some common guidelines to keep in mind when setting up render resources. Depth/Stencil early Aside from the obvious resources that are created with the rendering context (i.e. the backbuffer and the optional depth-stencil surface), the best surfaces to create are additional depth-stencil surfaces and render targets. The APIs generally limit the format and size of depth-stencil surfaces to match those of the co-bound render target surfaces, so the best thing to do is to create one depth-stencil surface for each render target format/size combination for which the application requires such a resource. If the application won t be writing depth or stencil information for a particular render target format/size, then there is no need to create a corresponding depth-stencil surface. Generally, depth-stencil buffers can be shared across corresponding render targets or render passes so there is no need to create a unique depthstencil buffer per render target. In practice, most applications usually only require one, maybe two, depth-stencil buffers in addition to the default one that corresponds to the backbuffer. Create these depth-stencil surfaces in order of importance to the application. The reason this is important is that when depth-stencil buffers are created first (or at least very early) the driver can allocate them in the best location in local video-memory such that the buffers benefit from Hyper-Z technology. Render targets also early As stated above, another best resource to create as early as possible is any additional off-screen render targets the application will require. Sometimes it is not possible to know how many additional render targets will be required or even what format(s) they should take on. In that case, a good approach is to use the best heuristics available to make an educated guess as to what might be needed throughout the current resource pool lifetime (e.g. through a single level in a game). Obviously, the tradeoff here is that the application may end up creating render targets that it never uses, wasting valuable memory. This usage pattern, however, may be indicative of a larger problem, and the application architect(s) might consider revisiting the design. Alternatively, this approach may be the only way to implement an algorithm or solve a particular problem. In that case, the best the driver can ask for is that the application creates its render targets as early as possible.
2 Keep an eye on how many render targets are created and the size and format of those surfaces. A fullscreen render target at 1024x768 using 8-bits per channel consumes roughly 3MB of space. Add multisampling to that along with the corresponding multisampled depth-stencil buffer, and you re up to potentially 24MB for one render target! This is why a clear understanding of which off-screen render target surface formats and sizes will be required and how many render targets must coexist simultaneously is important. Creating these surfaces as early as possible allows the driver to place them in the appropriate memory location before spilling into not-as-optimal locations. Finally on the topic of render targets, the ordering of creation in terms of the formats and sizes used does not make much difference. The application will most probably benefit the most by allocating the most commonly used render targets first, followed by those less used. LVM followed by non-lvm then system memory Okay, now that the depth-stencil surfaces and render targets are allocated, the next best thing to create would be those resources that the application would prefer to have in local video-memory, followed by those that live in non-local video memory, and finally those that reside in system memory. In Direct3D terminology this translates into allocating D3DPOOL_DEFAULT resources followed by D3DPOOL_MANAGED resources. Actually, managed resources aren t loaded into LVM (or even non- LVM) until they are needed so really the application should create default pool resources before managed resources are paged in. The most effective way to ensure this is to create the default pool resources first, or evict managed resources immediately beforehand. Vertex and index buffers are a good thing to allocate at this point, as are textures. If an application is really pressed for memory, then the ordering of textures versus geometry buffers may differ from another application based on usage patterns. Say one application uses a lot of geometry but only a few textures, it may want to make sure all that geometry resides in LVM versus another application that uses less geometry but constantly switches textures and can afford the latency of fetching geometry from non-lvm memory while performing expensive pixel operations. When to go static Sometimes deciding whether a vertex or index buffer should be static or dynamic can be confusing. Adding to the confusion is the fact that index buffers behave slightly differently than vertex buffers. Here we will try to dispel some rumors and provide a bit of information as to how different buffers should be allocated. While this information is Direct3D-centric, the same concepts apply in OpenGL. Before we begin, however, let s get a bit of terminology out of the way. The term static has multiple meanings, so we cannot blindly say that locking a static buffer is bad. Buffers created without D3DUSAGE_DYNAMIC are not necessarily static, either, as far as the driver is concerned (regardless of vendor). Remember to keep this distinction in mind as we wade through the following discussion. Index Buffers Index buffers can be allocated in three memory pools: D3DPOOL_DEFAULT, D3DPOOL_MANAGED, or D3DPOOL_SYSTEM. While system memory pools suffer from a little extra overhead when copying data to the hardware, just about everything else about them does not cause any confusion or problems because everything happens in system memory. Consequently, we focus our discussion here on the default and managed memory locations.
3 The default pool An index buffer created in the default pool has the option of providing various usage flags at creation time: D3DUSAGE_WRITEONLY If not set, the driver will not create an LVM buffer. Instead, the Direct3D runtime will create a system memory copy of the resource to be flushed to the GPU upon first use (of each update). D3DUSAGE_DYNAMIC This flag indicates that the data in the buffer will change frequently. In particular, this flag stipulates that the contents presently being used in rendering will change frequently. Some drivers do not create a video memory surface in this case in favor of allowing the Direct3D runtime to create a system memory copy to which the CPU has direct writecombining access. If D3DUSAGE_WRITEONLY is set without D3DUSAGE_DYNAMIC, current drivers will try to create a LVM buffer. If this fails, then the driver must try to fall back to non-lvm. Now, whenever the application does a lock on a default pool index buffer, and the buffer is in video memory, the driver receives a lock call. A well-written application will use one of D3DLOCK_DISCARD or D3DLOCK_NOOVERWRITE. D3DLOCK_DISCARD indicates to the driver that it is safe to perform index buffer renaming (i.e. allocate or return another internal buffer without stalling). D3DLOCK_NOOVERWRITE signals that the application is not going to overwrite any of the contents already written (i.e. the driver is safe to return a pointer into the index buffer without stalling). In either case, the driver does not have to stall and the application need only write the data that it is updating. Failure to appropriately use these locking flags will cause the driver to stall while the current contents of the index buffer are done rendering. The managed pool An index buffer created in the managed pool cannot be marked with the usage flag D3DUSAGE_DYNAMIC; the Direct3D runtime disallows it. Also, there is no such thing as D3DUSAGE_STATIC at the API level, making life a little more interesting for the driver. When an index buffer is created in the managed pool, the Direct3D runtime creates the resource in system memory. All application locking calls only affect this system memory copy, and all updates happen here. The first time an unlock is made, the Direct3D runtime calls the driver and attempts to create a writeonly buffer in video memory that represents the managed buffer. Different vendors drivers use varying heuristics to determine whether this means allocating space in LVM or non-lvm. The nice thing about lock calls on managed pool resources is that they provide parameters for the offset and the size to lock, making things a bit simpler for the driver. Upon drawing with the index buffer, the runtime presents the driver with some information about how to best transfer the data from the updated system memory host copy into the video memory draw copy. Depending on where the resource ended up residing, various optimized copying mechanism can be invoked.
4 Note that allocating non-d3dusage_dynamic index buffers that exhibit dynamic behavior can sometimes be a win, especially on CrossFire (or similar) configurations. Now, with all this background information in mind, here are three common index buffer usage scenarios and some advice on how to allocate index buffers. An index buffer that only requires updates to areas that haven t already been written and aren t currently used in a draw In this case, the application should use the default pool and manage the locks with the locking flags described above. The dynamic usage flag should not be used. Alternatively, the managed pool may not be a bad option; it requires a bit more CPU work, and even some GPU overhead, but there shouldn t be any hardware stalls. An index buffer that requires updates to areas that have already been written and have been used in a draw call Here the application can go ahead and create the buffer in the default pool with the dynamic usage flag. Locks will likely not be expensive even though the locking flags are invalid for D3DUSAGE_DYNAMIC buffers because locking should happen to system memory buffer. Again, the managed pool isn t a bad alternative, and for cases in which the index buffer is updated once for several draw calls, the managed pool might be a better approach. Your mileage may vary. An index buffer that requires updates to the entire buffer every draw in which it s used Definitely use the default pool and manage the locks with the appropriate locking flags in this case. Do not set the dynamic usage flag, however. The managed pool is not a good option in this scenario. Vertex Buffers Vertex buffer creation and usage generally follows the same guidelines as index buffers, though the drivers and even the hardware may handle things a bit differently internally. Drivers generally try to create all vertex buffers in video memory, and dynamic vertex buffers generally end up in non-lvm for better CPU access. It is strongly recommended to NOT place vertex buffers (static or dynamic) in the system memory pool. The Direct3D runtime essentially behaves the same for vertex and index buffers. Other tidbits Always be sure to use the appropriate flags when creating resources through the API. The flags provide tremendous insight to the driver as to how the resource being created will be used, thus giving it clear direction as to where the best location for that resource will be for optimal performance. Also, avoid creating and destroying resources on-the-fly, per-frame. Resource allocation has tremendous overhead, comparatively, and this behavior can cause fragmentation and other memory-related problems. Occasionally, it makes sense to evict all the managed resources from video memory, like when switching levels or worlds in a game, and Direct3D provides an API for this. Performing this eviction will clean up lots of fragmentation that may have built up throughout the last level, and provide a clean slate for the next one. Lastly, should memory be a point of contention for your application, consider using ATI s plug-in for PIX or a similar tool to understand when and how many resources are being
5 created, and what they are used for. Note that the PIX plug-in can also give you useful information as to how well managed vertex/index buffers and textures are playing. Generally, knowing how video memory is utilized by an application and optimizing resource allocations can go a long way to providing (or at least setting the stage for) the best runtime performance. References The ATI plug-in for PIX: /atipix/index.html Acknowledgements The author of this paper wishes the thank Tim Kelley of ATI Technologies Inc. for his great patience and detailed explanations, and the ATI ISV Engineering and Application Research teams for their comments and contributions.
Optimizing Direct3D for the GeForce 256 Douglas H. Rogers Please send me your comments/questions/suggestions
Optimizing Direct3D for the GeForce 256 Douglas H. Rogers Please send me your comments/questions/suggestions drogers@nvidia.com Transform and Lighting (T&L) Acceleration under Direct3D To enable hardware
More informationLow-Overhead Rendering with Direct3D. Evan Hart Principal Engineer - NVIDIA
Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA Ground Rules No DX9 Need to move fast Big topic in 30 minutes Assuming experienced audience Everything is a tradeoff These are
More informationCould you make the XNA functions yourself?
1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which
More informationThe Application Stage. The Game Loop, Resource Management and Renderer Design
1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data
More informationMemory Management: Virtual Memory and Paging CS 111. Operating Systems Peter Reiher
Memory Management: Virtual Memory and Paging Operating Systems Peter Reiher Page 1 Outline Paging Swapping and demand paging Virtual memory Page 2 Paging What is paging? What problem does it solve? How
More informationChapter 8 Virtual Memory
Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven
More informationRSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog
RSX Best Practices Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog RSX Best Practices About libgcm Using the SPUs with the RSX Brief overview of GCM Replay December 7 th, 2004
More informationOptimizing DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing DirectX Graphics Richard Huddy European Developer Relations Manager Some early observations Bear in mind that graphics performance problems are both commoner and rarer than you d think The most
More informationOptimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager
Optimizing for DirectX Graphics Richard Huddy European Developer Relations Manager Also on today from ATI... Start & End Time: 12:00pm 1:00pm Title: Precomputed Radiance Transfer and Spherical Harmonic
More informationFILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23
FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most
More informationPractical Performance Analysis Koji Ashida NVIDIA Developer Technology Group
Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group Overview Tools for the analysis Finding pipeline bottlenecks Practice identifying the problems Analysis Tools NVPerfHUD Graph
More informationCS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28
CS 220: Introduction to Parallel Computing Introduction to CUDA Lecture 28 Today s Schedule Project 4 Read-Write Locks Introduction to CUDA 5/2/18 CS 220: Parallel Computing 2 Today s Schedule Project
More informationOperating System Principles: Memory Management Swapping, Paging, and Virtual Memory CS 111. Operating Systems Peter Reiher
Operating System Principles: Memory Management Swapping, Paging, and Virtual Memory Operating Systems Peter Reiher Page 1 Outline Swapping Paging Virtual memory Page 2 Swapping What if we don t have enough
More informationEmbedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi
Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed
More informationPowerVR Performance Recommendations The Golden Rules. October 2015
PowerVR Performance Recommendations The Golden Rules October 2015 Paul Ly Developer Technology Engineer, PowerVR Graphics Understanding Your Bottlenecks Based on our experience 3 The Golden Rules 1. The
More informationSqueezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques
Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques Jonathan Zarge, Team Lead Performance Tools Richard Huddy, European Developer Relations Manager ATI
More informationPowerVR Series5. Architecture Guide for Developers
Public Imagination Technologies PowerVR Series5 Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationCS 111. Operating Systems Peter Reiher
Operating System Principles: File Systems Operating Systems Peter Reiher Page 1 Outline File systems: Why do we need them? Why are they challenging? Basic elements of file system design Designing file
More information15 Sharing Main Memory Segmentation and Paging
Operating Systems 58 15 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationVirtual Memory #2 Feb. 21, 2018
15-410...The mysterious TLB... Virtual Memory #2 Feb. 21, 2018 Dave Eckhardt Brian Railing 1 L16_VM2 Last Time Mapping problem: logical vs. physical addresses Contiguous memory mapping (base, limit) Swapping
More informationVirtual Memory. Chapter 8
Virtual Memory 1 Chapter 8 Characteristics of Paging and Segmentation Memory references are dynamically translated into physical addresses at run time E.g., process may be swapped in and out of main memory
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:
More informationRendering Grass with Instancing in DirectX* 10
Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article
More informationResolve your Resolves Jon Story Holger Gruen AMD Graphics Products Group
Jon Story Holger Gruen AMD Graphics Products Group jon.story@amd.com holger.gruen@amd.com Introduction Over the last few years it has become common place for PC games to make use of Multi-Sample Anti-Aliasing
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 205 Lecture 23 LAST TIME: VIRTUAL MEMORY! Began to focus on how to virtualize memory! Instead of directly addressing physical memory, introduce a level of
More informationVulkan (including Vulkan Fast Paths)
Vulkan (including Vulkan Fast Paths) Łukasz Migas Software Development Engineer WS Graphics Let s talk about OpenGL (a bit) History 1.0-1992 1.3-2001 multitexturing 1.5-2003 vertex buffer object 2.0-2004
More informationChapter 8. Virtual Memory
Operating System Chapter 8. Virtual Memory Lynn Choi School of Electrical Engineering Motivated by Memory Hierarchy Principles of Locality Speed vs. size vs. cost tradeoff Locality principle Spatial Locality:
More informationBringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games
Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming
More informationHardware-driven visibility culling
Hardware-driven visibility culling I. Introduction 20073114 김정현 The goal of the 3D graphics is to generate a realistic and accurate 3D image. To achieve this, it needs to process not only large amount
More informationDX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology
DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology The Point of this talk The attempt to combine wisdom and power has only rarely been successful and then only for
More informationPOWERVR MBX. Technology Overview
POWERVR MBX Technology Overview Copyright 2009, Imagination Technologies Ltd. All Rights Reserved. This publication contains proprietary information which is subject to change without notice and is supplied
More information16 Sharing Main Memory Segmentation and Paging
Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per
More informationTechnical Report. SLI Best Practices
Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling
More informationMemory Management Virtual Memory
Memory Management Virtual Memory Part of A3 course (by Theo Schouten) Biniam Gebremichael http://www.cs.ru.nl/~biniam/ Office: A6004 April 4 2005 Content Virtual memory Definition Advantage and challenges
More informationReengineering II. Transforming the System
Reengineering II Transforming the System Recap: Reverse Engineering We have a detailed impression of the current state We identified the important parts We identified reengineering opportunities We have
More informationMemory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008
Dynamic Storage Allocation CS 44: Operating Systems Spring 2 Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need to grow or shrink. Dynamic
More informationCoding OpenGL ES 3.0 for Better Graphics Quality
Coding OpenGL ES 3.0 for Better Graphics Quality Part 2 Hugo Osornio Rick Tewell A P R 1 1 t h 2 0 1 4 TM External Use Agenda Exercise 1: Array Structure vs Vertex Buffer Objects vs Vertex Array Objects
More informationHere s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and
1 Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and the light. 2 To visualize this problem, consider the
More informationprintf Debugging Examples
Programming Soap Box Developer Tools Tim Purcell NVIDIA Successful programming systems require at least three tools High level language compiler Cg, HLSL, GLSL, RTSL, Brook Debugger Profiler Debugging
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationChapter 4: Memory Management. Part 1: Mechanisms for Managing Memory
Chapter 4: Memory Management Part 1: Mechanisms for Managing Memory Memory management Basic memory management Swapping Virtual memory Page replacement algorithms Modeling page replacement algorithms Design
More informationIt s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.
It s possible to get your email inbox to zero and keep it there, even if you get hundreds of emails a day. It s not super complicated, though it does take effort and discipline. Many people simply need
More informationTechnical Report. SLI Best Practices
Technical Report SLI Best Practices Abstract This paper describes techniques that can be used to perform application-side detection of SLI-configured systems, as well as ensure maximum performance scaling
More informationDirect3D 11 Performance Tips & Tricks
Direct3D 11 Performance Tips & Tricks Holger Gruen Cem Cebenoyan AMD ISV Relations NVIDIA ISV Relations Agenda Introduction Shader Model 5 Resources and Resource Views Multithreading Miscellaneous Q&A
More informationComputer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi
Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 32 The Memory Systems Part III Welcome back. (Refer Slide
More informationBeyond Programmable Shading. Scheduling the Graphics Pipeline
Beyond Programmable Shading Scheduling the Graphics Pipeline Jonathan Ragan-Kelley, MIT CSAIL 9 August 2011 Mike s just showed how shaders can use large, coherent batches of work to achieve high throughput.
More informationCache introduction. April 16, Howard Huang 1
Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently
More informationWhy modern versions of OpenGL should be used Some useful API commands and extensions
Michał Radziszewski Why modern versions of OpenGL should be used Some useful API commands and extensions Timer Query EXT Direct State Access (DSA) Geometry Programs Position in pipeline Rendering wireframe
More informationChapter 8 Virtual Memory
Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page
More informationCS510 Operating System Foundations. Jonathan Walpole
CS510 Operating System Foundations Jonathan Walpole A Solution to the Gaming Parlor Programming Project The Gaming Parlor - Solution Scenario: Front desk with dice (resource units) Groups request (e.g.,
More informationArcGIS Runtime: Maximizing Performance of Your Apps. Will Jarvis and Ralf Gottschalk
ArcGIS Runtime: Maximizing Performance of Your Apps Will Jarvis and Ralf Gottschalk Agenda ArcGIS Runtime Version 100.0 Architecture How do we measure performance? We will use our internal Runtime Core
More informationMany rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.
1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,
More informationEECS 487: Interactive Computer Graphics
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with
More informationGraphics Performance Optimisation. John Spitzer Director of European Developer Technology
Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance
More informationUp and Running Software The Development Process
Up and Running Software The Development Process Success Determination, Adaptative Processes, and a Baseline Approach About This Document: Thank you for requesting more information about Up and Running
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationGPU Memory Model. Adapted from:
GPU Memory Model Adapted from: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary J. Katz, University
More informationMali Developer Resources. Kevin Ho ARM Taiwan FAE
Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture
More informationPlot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;
How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory
More informationGraphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal
Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High
More informationECE519 Advanced Operating Systems
IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (8 th Week) (Advanced) Operating Systems 8. Virtual Memory 8. Outline Hardware and Control Structures Operating
More informationCS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23
CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 208 Lecture 23 LAST TIME: VIRTUAL MEMORY Began to focus on how to virtualize memory Instead of directly addressing physical memory, introduce a level of indirection
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:
More informationVulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics
Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility
More informationAll Paging Schemes Depend on Locality. VM Page Replacement. Paging. Demand Paging
3/14/2001 1 All Paging Schemes Depend on Locality VM Page Replacement Emin Gun Sirer Processes tend to reference pages in localized patterns Temporal locality» locations referenced recently likely to be
More informationDrawing Fast The Graphics Pipeline
Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position
More informationDRI Memory Management
DRI Memory Management Full strength manager wasn't required for traditional usage: Quake3 and glxgears. Perceived to be difficult. Fundamental for modern desktops, offscreen rendering. Talked about for
More informationOptimisation. CS7GV3 Real-time Rendering
Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that
More informationMXwendler Fragment Shader Development Reference Version 1.0
MXwendler Fragment Shader Development Reference Version 1.0 This document describes the MXwendler fragmentshader interface. You will learn how to write shaders using the GLSL language standards and the
More informationRecall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory
Recall: Address Space Map 13: Memory Management Biggest Virtual Address Stack (Space for local variables etc. For each nested procedure call) Sometimes Reserved for OS Stack Pointer Last Modified: 6/21/2004
More informationChapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change
Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004
More informationTopic 18: Virtual Memory
Topic 18: Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level of indirection
More informationAddress spaces and memory management
Address spaces and memory management Review of processes Process = one or more threads in an address space Thread = stream of executing instructions Address space = memory space used by threads Address
More informationCS399 New Beginnings. Jonathan Walpole
CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,
More informationCS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11
CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11 CS 536 Spring 2015 1 Handling Overloaded Declarations Two approaches are popular: 1. Create a single symbol table
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationOperating Systems. Overview Virtual memory part 2. Page replacement algorithms. Lecture 7 Memory management 3: Virtual memory
Operating Systems Lecture 7 Memory management : Virtual memory Overview Virtual memory part Page replacement algorithms Frame allocation Thrashing Other considerations Memory over-allocation Efficient
More informationAddress Translation. Tore Larsen Material developed by: Kai Li, Princeton University
Address Translation Tore Larsen Material developed by: Kai Li, Princeton University Topics Virtual memory Virtualization Protection Address translation Base and bound Segmentation Paging Translation look-ahead
More informationBuffer Management for XFS in Linux. William J. Earl SGI
Buffer Management for XFS in Linux William J. Earl SGI XFS Requirements for a Buffer Cache Delayed allocation of disk space for cached writes supports high write performance Delayed allocation main memory
More informationRasterization and Graphics Hardware. Not just about fancy 3D! Rendering/Rasterization. The simplest case: Points. When do we care?
Where does a picture come from? Rasterization and Graphics Hardware CS559 Course Notes Not for Projection November 2007, Mike Gleicher Result: image (raster) Input 2D/3D model of the world Rendering term
More informationRaise your VR game with NVIDIA GeForce Tools
Raise your VR game with NVIDIA GeForce Tools Yan An Graphics Tools QA Manager 1 Introduction & tour of Nsight Analyze a geometry corruption bug VR debugging AGENDA System Analysis Tracing GPU Range Profiling
More informationa process may be swapped in and out of main memory such that it occupies different regions
Virtual Memory Characteristics of Paging and Segmentation A process may be broken up into pieces (pages or segments) that do not need to be located contiguously in main memory Memory references are dynamically
More informationPer-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer
Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer Executive Summary The NVIDIA Quadro2 line of workstation graphics solutions is the first of its kind to feature hardware support for
More informationLecture 25: Board Notes: Threads and GPUs
Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationThe Operating System. Chapter 6
The Operating System Machine Level Chapter 6 1 Contemporary Multilevel Machines A six-level l computer. The support method for each level is indicated below it.2 Operating System Machine a) Operating System
More informationMemory management, part 2: outline. Operating Systems, 2017, Danny Hendler and Amnon Meisels
Memory management, part 2: outline 1 Page Replacement Algorithms Page fault forces choice o which page must be removed to make room for incoming page? Modified page must first be saved o unmodified just
More informationInside the PostgreSQL Shared Buffer Cache
Truviso 07/07/2008 About this presentation The master source for these slides is http://www.westnet.com/ gsmith/content/postgresql You can also find a machine-usable version of the source code to the later
More informationTopic 18 (updated): Virtual Memory
Topic 18 (updated): Virtual Memory COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Virtual Memory Any time you see virtual, think using a level
More informationComputergrafik. Matthias Zwicker. Herbst 2010
Computergrafik Matthias Zwicker Universität Bern Herbst 2010 Today Bump mapping Shadows Shadow mapping Shadow mapping in OpenGL Bump mapping Surface detail is often the result of small perturbations in
More informationDistributed Virtual Reality Computation
Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal
More informationVirtual Memory Outline
Virtual Memory Outline Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples
More informationCS 326: Operating Systems. Process Execution. Lecture 5
CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation
More informationPAGE REPLACEMENT. Operating Systems 2015 Spring by Euiseong Seo
PAGE REPLACEMENT Operating Systems 2015 Spring by Euiseong Seo Today s Topics What if the physical memory becomes full? Page replacement algorithms How to manage memory among competing processes? Advanced
More informationCS 136: Advanced Architecture. Review of Caches
1 / 30 CS 136: Advanced Architecture Review of Caches 2 / 30 Why Caches? Introduction Basic goal: Size of cheapest memory... At speed of most expensive Locality makes it work Temporal locality: If you
More informationMemory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358
Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement
More informationShader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express
Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express Level: Intermediate Area: Graphics Programming Summary This document is an introduction to the series of samples,
More informationStreaming Massive Environments From Zero to 200MPH
FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I
More informationMulti-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis
Multi-level Translation CS 57 Lecture 9 Paging Michael Swift Problem: what if you have a sparse address space e.g. out of GB, you use MB spread out need one PTE per page in virtual address space bit AS
More information