There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse.

Size: px
Start display at page:

Download "There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse."

Transcription

1 Sample Tweaker Ocean Fog Overview This paper will discuss how we successfully optimized an existing graphics demo, named Ocean Fog, for our latest processors with Intel Integrated Graphics. We achieved a 4x boost in performance (40 FPS to 160 FPS) with very little to no fidelity loss by applying techniques such as reducing texture sizes and lowering precision. These optimization techniques are not revolutionary by any means, but knowing when to apply them can be a bit more involved. To help us identify where we might be able to optimize, we used Intel s graphics profiler, called Intel Graphics Performance Analyzers or Intel GPA for short. We will use screenshots of Intel GPA to show how we identified a graphics bottleneck and then detail how we tried to optimize or fix those problem areas. Understanding the architecture that you are optimizing for can really help you in deciding how to fix problem areas. Intel GPA allows you to run different tests against problem areas to help identify the problems and possible fixes without an intimate knowledge of the architecture. In this paper, you will see that our tests are labeled as 2x2 textures or simple pixel shader. Those tests are built into Intel GPA and are not something that a person would have to modify themselves in the existing application. The purpose of the original Ocean Fog project was to investigate how to effectively render a realistic ocean scene on differing graphics solutions while trying to provide a good, current, working class set of data to the graphics community. The ocean was rendered by using a projected grid that is displayed orthogonally to the viewer. The vertices of the grid are displaced using a height field. Perlin noise was used for generating wave motion. In the original paper, the author notes that computation Perlin noise was less CPU-intensive than other methods. However, other methods like Navier- Stokes work better on the GPU side and the author mentions it is worth further investigation. Snell s law was used for reflection and refractions. For more information on how the water was rendered, please see Claes Johanson s Master s thesis, Real-time water rendering - Introducing the projected grid concept. The fog was also generated using Perlin noise. The processing for the fog was also done on the CPU side. This was done by sampling points in the 3D texture space. There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse. For further discussion please see: Ocean Fog using Direct3D 10. Optimization Summary The original application was running at 40 FPS on our test hardware 1 ; after all optimizations, it was running 4x faster at 160 FPS. CPU utilization went from 8% to 84%, GPU active time from 32% to 85%, and GPU stall time from 53% to 9%. 1 We used 3 systems. First, an Intel microarchitecture codename Sandy Bridge processor-based platform with a 2.4Ghz processor running 64-bit Microsoft Windows* 7, 4GB of memory and an 80GB solid-state disk. Second, an Intel Core i5 640 processor-based platform with a 3.2Ghz processor running 32-bit Microsoft Windows Vista*, 2GB of memory and a Seagate* 7200RPM 500GB disk. Third, an Intel Core 2 Duo T7700 processor-based system with a 2.4Ghz processor running 64-bit Microsoft Windows* 7, 4GB of memory, an 80GB solid-state disk, and an NVIDIA Quadro* FX-570 graphics card 1

2 The largest slowdown on the application was the generation and shading of the water. Optimizations included normal map size and depth reduction, reflection and refraction size and depth reduction, code and resource cleanup, and rendering fixes. Output Figure 1 - Ocean Fog Results Final Results (msec/frame) Sandy Bridge Intel Core i5-661 with Intel HD Graphics Intel Core TM 2 Duo T7700 with an NVIDIA Quadro* FX 570M card Original Optimized Intel microarchitecture codename Sandy Bridge showed 4x improvement over original. Below are side-by-side screenshots of the Intel GPA System Analyzer application before and after optimizations. The application has several line charts that show the activity levels of the CPU and GPU. The first line chart (at the top) 2

3 shows the frame rate. The second chart down shows the CPU utilization by processor. The third chart down shows the GPU EU active % time, which is roughly the utilization of the GPU. The fourth chart down is the GPU % busy time, which is the GPU active time plus the GPU stalled time. The fifth chart down is the GPU % stalled time. Stalls are a general bucket that could refer to many conditions. For example, you could be stalled on a texture fetch from the sampler, or a vertex fetch from main memory. As you can see here, the application is stalled on the GPU a good percentage of the time and is hardly utilizing any of the CPU or GPU. This is typically a good sign if you are looking to see whether there are possible optimizations. Original No Overrides Original vs. Optimized Optimized No Overrides Significant increase in GPU percent active time and significant decrease in GPU percent stalled time. 3

4 Below are side by side screenshots of the Intel GPA Frame Analyzer tool. The tool can show the performance of each individual draw call (named erg in the tool) and allow you to try experiments to see how you might be able to improve that particular draw call. Ergs that take longer are represented by taller bars in the bar chart, so those are generally a good place to start. It is hard to tell, but if you look at the Y-scale numbers of the bar chart, you will notice the biggest erg went from over 12,000 microseconds to around 2500 microseconds. Original Original vs. Optimized Optimized The two largest ergs went from 62.5% of scene time to 35.7% of the scene time. 4

5 Next is another side-by-side screenshot of Intel GPA System Analyzer. This time we compare the original profile against 2x2 textures. 2x2 textures is an override in the tool that causes all textures to be used in the GPU to be a simple 2x2 texture. This is a good way to know whether you are stalled due to size of your textures. You ll notice below that the GPU stalled time (bottom chart) goes from around 57% down to 17% with 2x2 textures. This tells us that our textures are too big for our sampler cache to hold efficiently. No Overrides Original 2x2 Textures Note: The Core i5-661 processor-based system with Intel HD Graphics showed similar results 5

6 Performance Analysis Overview Intel GPA System Analyzer showed nearly 60% stall time initially. Using 2x2 textures override showed the greatest reduction of GPU stall time, to about 17%. Normal Map Generation Calculation of water normal map (1024x2048 RGBA 32 bit) Pixel Shader takes substantially more time. Optimizations Scaling water normal map The water normal map took about 32 MB to generate and was shown in Intel GPA Frame Analyzer to be the most costly erg. Remember the tall yellow line that went from 12,000 microseconds to 2500 microseconds in the side-by-side screen shot of Intel GPA Frame Analyzer? This is what it actually represents in the scene. We experimented by changing the size of the normal map to see how it affected performance and visual fidelity. Our goal throughout optimization was to preserve visual fidelity as much as possible. In the table below, you will notice that reducing the size of the normal map greatly reduced the time of that erg. Looking at the 3 side-by-side screenshots of the water, you will notice some falloff in visual fidelity based on the reduction in size. Normal Map Build Resolution msec/frame Original 1024x Optimized 1024x Optimized 512x Optimized 256x Figure 2 - Normal map sizes: 1024, 512, 256 6

7 In conjunction with other water noise settings (falloff, scale), we were able to get the quality to look almost as good as the original 1024x2048 normal map without the frame rate penalty while maintaining the crisp water effect. Figure 3 - Normal map: 1024 vs. 256 Settings Tweaked 7

8 Water Shading In the Intel GPA Frame Analyzer side-by-side screenshot below, we are showing the before and after of applying the 2x2 textures experiment within the tool. This erg is definitely texture-bound. Before 2x2 Textures Experiment After After 2x2 Textures experiment, there was 32.2% less GPU time After performing Intel GPA Frame Analyzer s 2x2 Textures experiment, it was shown there was a 32.2% improvement on GPU time. Additionally, the textures in the pipeline were inspected. The reflection, refraction, and normal map were 32- bit textures. So the optimizations here would be reducing the size and depth of the textures without losing much fidelity. 8

9 Figure 4 - Graphics Pipeline showing 32-bit textures Water.fx sampled textures in pixel shader, Original Build Type Description Resolution Depth TextureCube Environment 1024 x 1024 x 6 8 bit Texture3D Fog 50 x 50 x 50 8 bit Texture2D Fresnel 256 x 1 8 bit Texture2D Normal 1024 x bit Texture2D Refraction 512 x bit Texture2D Reflection 512 x bit Optimizations Reflection, refraction and normal maps were changed from 32-bit to 16-bit textures and their changes to the FPS were noted. Reduction of reflection and refraction maps showed the greatest improvement, followed by using 16-bit depth. Texture depth changes (msec/frame) Reflection/Refraction Map Normal Map RGBA 32 bit RGBA 16 bit Reflection/Refraction Map Dimension change (msec/frame) Reflection/Refraction Map 512x512, RGBA 32 bit x256, RGBA 32 bit x256, RGBA 16 bit Both map reduction and 16 bit depth provided 1.5x improvement The depth change from 32-bit to 16-bit showed a slightly grainier normal map. The reflection and refraction dimension reduction to 256x256 showed a pixelated reflection/refraction map only when there was zero wave amplitude, and thus no water distortion. However, after any wave amplitude or water distortion, the pixelation could not be seen; along with the addition of fog, the difference between the image fidelities could not be seen anymore. 9

10 Figure 5 - Reflection/Refraction Map 32bit vs 16 bit Figure 6 - Reflection/Refraction Map - 256x256 vs. 512x512 Next, the skybox (1024x1024x6) was replaced with a smaller version (256x256x6), and because of the gradient and unfocused nature of the texture, there was no change in fidelity. There was about a 3% FPS increase with all other objects turned off in the scene. 10

11 Figure x1024 vs 256x256 Cubemap 11

12 Miscellaneous Optimizations Removing unnecessary clear calls The clearing of the reflection and refraction render targets were disabled when they were unchecked from the GUI. This gave a frame boost from 52 FPS (water render only) to about 73 FPS. When everything else was rendered in the scene, the frame rate dropped when reflection was disabled; however, this behavior was also observed with the original build. Clearing must be done at every frame because as the camera moves, the reflection and refraction map must change. MIP Generation Generating the additional MIP levels for the normal map did not show a significant change in FPS, but we thought it might help, so we tried the experiment anyway. MIP generation (msec/frame) Normal Map Size MIPs 1024x2048, 32 bit 1024x2048, 16 bit 256x512, 16 bit One All eight levels Offloading GPU work One possible optimization would be moving more work from the GPU because the CPU is not being fully utilized. The two largest shaders that compute the height and normal map were disabled, which showed about a 14% increase in frame rate. One possible implementation would to pass normal and height map information, generated on the CPU, along with the rest of the vertex data. This could also greatly reduce frame time, but we thought it might be an interesting experiment that we didn t have time to try. Disabling Shader work (msec/frame) Normal Map Size 256x x2048 GPU Work No GPU Work Summary We achieved a 4x performance improvement in this application by using Intel GPA to help us identify the GPU bottlenecks and possible solutions. This application was stalled mainly on textures, so reducing the size and precision allowed us to gain some substantial performance. In doing so, we lost some minor visual fidelity, and in some cases we mitigated that loss by varying other simulation parameters. The optimizations that we did should be considered on a case-by-case basis, because sometimes you might need that extra precision or even fidelity to convey to the user what is visually important in your application or game. The technique and tools we used can be applied to any graphics application to troubleshoot performance problems, so you should consider those on your next optimization adventure. About the Author Jeff Laflam is a software engineer in the Intel Software and Services Group, where he supports Intel graphics solutions in the Visual Computing Software Division. 12

13 Optimization Notice Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products. 13

14 Appendix General Procedure Various application overrides were created such as disabling the water, island, lighthouse spot light, clouds, and sky (Fig. 12). It was found that disabling the water showed the greatest improvement in frame rate. The application overrides were used in conjunction with Intel GPA. Slowdowns were further narrowed down through Intel GPA System Analyzer, showing that 2x2 textures override marked the greatest frame improvement. Finally, Intel GPA Frame Analyzer was used to pinpoint the exact ergs in which the slowdowns occurred. Additionally, Intel GPA Frame Analyzer allowed 2x2 textures experiment and texture information in the pipeline, which allowed better measurement of what was being used in the erg. Figure 8 - Application Overrides 14

15 Final Results Miscellaneous Original 2x2 Textures vs. Optimized Original 2x2 Textures Optimized - No Overrides Optimized version shows better results than the original s 2x2 textures override. Textures There are 27 textures (29.5 MB) in the textures folder. The skybox, cubemap-newer.dds, is 24 MB of the total. There are additional textures procedurally generated at the beginning. First is the fog texture, which is a 50x50x50 8-bit texture. The application takes up about MB of memory before starting (before optimizations). This can be from file reads, texture generation, and various inefficiencies. Other textures need to be adjusted for proper use, as some are too large or small for their surface. 15

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA) Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA) Intel INDE Graphics Performance Analyzers (GPA) are powerful, agile tools enabling game developers

More information

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer A New Gaming Experience Made Possible With Processor Graphics Released in early 2011, the 2nd Generation

More information

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000 Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000 Intel Corporation: Cage Lu, Kiefer Kuah Giant Interactive Group, Inc.: Yu Nana Abstract The performance

More information

Could you make the XNA functions yourself?

Could you make the XNA functions yourself? 1 Could you make the XNA functions yourself? For the second and especially the third assignment, you need to globally understand what s going on inside the graphics hardware. You will write shaders, which

More information

Game Programming Lab 25th April 2016 Team 7: Luca Ardüser, Benjamin Bürgisser, Rastislav Starkov

Game Programming Lab 25th April 2016 Team 7: Luca Ardüser, Benjamin Bürgisser, Rastislav Starkov Game Programming Lab 25th April 2016 Team 7: Luca Ardüser, Benjamin Bürgisser, Rastislav Starkov Interim Report 1. Development Stage Currently, Team 7 has fully implemented functional minimum and nearly

More information

Scalable multi-gpu cloud raytracing with OpenGL

Scalable multi-gpu cloud raytracing with OpenGL Scalable multi-gpu cloud raytracing with OpenGL University of Žilina Digital technologies 2014, Žilina, Slovakia Overview Goals Rendering distant details in visualizations Raytracing Multi-GPU programming

More information

CS5610 Final Project : Realistic Water Simulation with opengl

CS5610 Final Project : Realistic Water Simulation with opengl CS5610 Final Project : Realistic Water Simulation with opengl Members: Ben Felsted EunGyoung Han Team Name: gldeepblue() Goal: We had planed on implementing the paper Interactive Animation of Ocean Waves

More information

Chapter 9- Ray-Tracing

Chapter 9- Ray-Tracing Ray-tracing is used to produce mirrored and reflective surfaces. It is also being used to create transparency and refraction (bending of images through transparent surfaceslike a magnifying glass or a

More information

Snow Shader R&D in UDK I3 DLC

Snow Shader R&D in UDK I3 DLC Snow Shader R&D in UDK I3 DLC Key Aspects: These are visual elements aside from correct physical values I used to create my snow Sparkles Detail Normal Noise Detail Specular noise Large Tiling Shapes Sparkles

More information

The Application Stage. The Game Loop, Resource Management and Renderer Design

The Application Stage. The Game Loop, Resource Management and Renderer Design 1 The Application Stage The Game Loop, Resource Management and Renderer Design Application Stage Responsibilities 2 Set up the rendering pipeline Resource Management 3D meshes Textures etc. Prepare data

More information

Streaming Massive Environments From Zero to 200MPH

Streaming Massive Environments From Zero to 200MPH FORZA MOTORSPORT From Zero to 200MPH Chris Tector (Software Architect Turn 10 Studios) Turn 10 Internal studio at Microsoft Game Studios - we make Forza Motorsport Around 70 full time staff 2 Why am I

More information

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers

Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting OpenCL*-related Metrics with Intel Graphics Performance Analyzers Collecting Important OpenCL*-related Metrics with Intel GPA System Analyzer Introduction Intel SDK for OpenCL* Applications

More information

There are many kinds of surface shaders, from those that affect basic surface color, to ones that apply bitmap textures and displacement.

There are many kinds of surface shaders, from those that affect basic surface color, to ones that apply bitmap textures and displacement. mental ray Overview Mental ray is a powerful renderer which is based on a scene description language. You can use it as a standalone renderer, or even better, integrated with 3D applications. In 3D applications,

More information

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 1 2 Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions. 3 We aim for something like the numbers of lights

More information

User Guide. Vertex Texture Fetch Water

User Guide. Vertex Texture Fetch Water User Guide Vertex Texture Fetch Water Introduction What Is the Vertex Texture Fetch Water Sample? The sample demonstrates a technique to render small to medium bodies of water using Vertex Texture Fetch

More information

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and 1 Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and the light. 2 To visualize this problem, consider the

More information

Rendering Grass with Instancing in DirectX* 10

Rendering Grass with Instancing in DirectX* 10 Rendering Grass with Instancing in DirectX* 10 By Anu Kalra Because of the geometric complexity, rendering realistic grass in real-time is difficult, especially on consumer graphics hardware. This article

More information

SAMPLING AND NOISE. Increasing the number of samples per pixel gives an anti-aliased image which better represents the actual scene.

SAMPLING AND NOISE. Increasing the number of samples per pixel gives an anti-aliased image which better represents the actual scene. SAMPLING AND NOISE When generating an image, Mantra must determine a color value for each pixel by examining the scene behind the image plane. Mantra achieves this by sending out a number of rays from

More information

Creating Flood Effects in Uncharted 3. Eben Cook VFX Naughty Dog

Creating Flood Effects in Uncharted 3. Eben Cook VFX Naughty Dog Creating Flood Effects in Uncharted 3 Eben Cook VFX Artist @ Naughty Dog Me me me BA in Communication Design from UNT Computer Science minor 11 years in the industry. EALA, Naughty Dog I ve been: Concept

More information

3D Starfields for Unity

3D Starfields for Unity 3D Starfields for Unity Overview Getting started Quick-start prefab Examples Proper use Tweaking Starfield Scripts Random Starfield Object Starfield Infinite Starfield Effect Making your own Material Tweaks

More information

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology Graphics Performance Optimisation John Spitzer Director of European Developer Technology Overview Understand the stages of the graphics pipeline Cherchez la bottleneck Once found, either eliminate or balance

More information

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y. COMP 4801 Final Year Project Ray Tracing for Computer Graphics Final Project Report FYP 15014 by Runjing Liu Advised by Dr. L.Y. Wei 1 Abstract The goal of this project was to use ray tracing in a rendering

More information

Carrara Tutorial: Undersea Effects with Light Cones and Gels in Carrara. Carl E. Schou. January 31, 2004

Carrara Tutorial: Undersea Effects with Light Cones and Gels in Carrara. Carl E. Schou. January 31, 2004 Carrara Tutorial: Undersea Effects with Light Cones and Gels in Carrara Carl E. Schou January 31, 2004 MorningStar Ascension There are many different ways to get underwater effects in computer graphics.

More information

Technical Report. Mesh Instancing

Technical Report. Mesh Instancing Technical Report Mesh Instancing Abstract What is Mesh Instancing? Before we talk about instancing, let s briefly talk about the way that most D3D applications work. In order to draw a polygonal object

More information

Real Time Rendering of Complex Height Maps Walking an infinite realistic landscape By: Jeffrey Riaboy Written 9/7/03

Real Time Rendering of Complex Height Maps Walking an infinite realistic landscape By: Jeffrey Riaboy Written 9/7/03 1 Real Time Rendering of Complex Height Maps Walking an infinite realistic landscape By: Jeffrey Riaboy Written 9/7/03 Table of Contents 1 I. Overview 2 II. Creation of the landscape using fractals 3 A.

More information

Com S 336 Final Project Ideas

Com S 336 Final Project Ideas Com S 336 Final Project Ideas Deadlines These projects are to be done in groups of two. I strongly encourage everyone to start as soon as possible. Presentations begin four weeks from now (Tuesday, December

More information

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance

Three OPTIMIZING. Your System for Photoshop. Tuning for Performance Three OPTIMIZING Your System for Photoshop Tuning for Performance 72 Power, Speed & Automation with Adobe Photoshop This chapter goes beyond speeding up how you can work faster in Photoshop to how to make

More information

Trendsetting The World of Visual Effects

Trendsetting The World of Visual Effects Trendsetting The World of Visual Effects cebas Visual Technology Inc. Incorporated in1988 in Heidelberg, Germany, now headquartered in Victoria, BC, Canada. Offering unique physics based software tools

More information

Abstract. Introduction. Kevin Todisco

Abstract. Introduction. Kevin Todisco - Kevin Todisco Figure 1: A large scale example of the simulation. The leftmost image shows the beginning of the test case, and shows how the fluid refracts the environment around it. The middle image

More information

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. 1 2 Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters. Crowd rendering in large environments presents a number of challenges,

More information

NVIDIA Parallel Nsight. Jeff Kiel

NVIDIA Parallel Nsight. Jeff Kiel NVIDIA Parallel Nsight Jeff Kiel Agenda: NVIDIA Parallel Nsight Programmable GPU Development Presenting Parallel Nsight Demo Questions/Feedback Programmable GPU Development More programmability = more

More information

Computer Hardware. ICS2O Mr. Emmell

Computer Hardware. ICS2O Mr. Emmell Computer Hardware ICS2O Mr. Emmell How much space on your USB? How much RAM in your phone? How much data can a BluRay hold? That whole B/KB/MB/GB/TB thing That whole B/KB/MB/GB/TB thing So how many Bytes

More information

CS 4620 Program 4: Ray II

CS 4620 Program 4: Ray II CS 4620 Program 4: Ray II out: Tuesday 11 November 2008 due: Tuesday 25 November 2008 1 Introduction In the first ray tracing assignment you built a simple ray tracer that handled just the basics. In this

More information

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane Rendering Pipeline Rendering Converting a 3D scene to a 2D image Rendering Light Camera 3D Model View Plane Rendering Converting a 3D scene to a 2D image Basic rendering tasks: Modeling: creating the world

More information

Advanced Maya Texturing and Lighting

Advanced Maya Texturing and Lighting Advanced Maya Texturing and Lighting Lanier, Lee ISBN-13: 9780470292730 Table of Contents Introduction. Chapter 1 Understanding Lighting, Color, and Composition. Understanding the Art of Lighting. Using

More information

GeForce3 OpenGL Performance. John Spitzer

GeForce3 OpenGL Performance. John Spitzer GeForce3 OpenGL Performance John Spitzer GeForce3 OpenGL Performance John Spitzer Manager, OpenGL Applications Engineering jspitzer@nvidia.com Possible Performance Bottlenecks They mirror the OpenGL pipeline

More information

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010

Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 1 Real-Time Ray Tracing Using Nvidia Optix Holger Ludvigsen & Anne C. Elster 2010 Presentation by Henrik H. Knutsen for TDT24, fall 2012 Om du ønsker, kan du sette inn navn, tittel på foredraget, o.l.

More information

Soft Particles. Tristan Lorach

Soft Particles. Tristan Lorach Soft Particles Tristan Lorach tlorach@nvidia.com January 2007 Document Change History Version Date Responsible Reason for Change 1 01/17/07 Tristan Lorach Initial release January 2007 ii Abstract Before:

More information

Chapter Adding 1- T Mo he tio B n le to nde Yo r ur Inte Scerfac ne e Landscape Scene Stormy Night.mp4 End 200 Default Animation frame 1 Location

Chapter Adding 1- T Mo he tio B n le to nde Yo r ur Inte Scerfac ne e Landscape Scene Stormy Night.mp4 End 200 Default Animation frame 1 Location 1- The Blender Interface Adding Motion to Your Scene Open your Landscape Scene file and go to your scene buttons. It s time to animate our dark and stormy night. We will start by making the correct setting

More information

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes Last Time? Programmable GPUS Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes frame buffer depth buffer stencil buffer Stencil Buffer Homework 4 Reading for Create some geometry "Rendering

More information

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation Direct3D API Issues: Instancing and Floating-point Specials Cem Cebenoyan NVIDIA Corporation Agenda Really two mini-talks today Instancing API Usage Performance / pitfalls Floating-point specials DirectX

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Game Architecture. 2/19/16: Rasterization

Game Architecture. 2/19/16: Rasterization Game Architecture 2/19/16: Rasterization Viewing To render a scene, need to know Where am I and What am I looking at The view transform is the matrix that does this Maps a standard view space into world

More information

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual

Software within building physics and ground heat storage. HEAT3 version 7. A PC-program for heat transfer in three dimensions Update manual Software within building physics and ground heat storage HEAT3 version 7 A PC-program for heat transfer in three dimensions Update manual June 15, 2015 BLOCON www.buildingphysics.com Contents 1. WHAT S

More information

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL CENG 477 Introduction to Computer Graphics Graphics Hardware and OpenGL Introduction Until now, we focused on graphic algorithms rather than hardware and implementation details But graphics, without using

More information

The Making of Seemore WebGL. Will Eastcott, CEO, PlayCanvas

The Making of Seemore WebGL. Will Eastcott, CEO, PlayCanvas The Making of Seemore WebGL Will Eastcott, CEO, PlayCanvas 1 What is Seemore WebGL? A mobile-first, physically rendered game environment powered by HTML5 and WebGL 2 PlayCanvas: Powering Seemore WebGL

More information

Real Time Atmosphere Rendering for the Space Simulators

Real Time Atmosphere Rendering for the Space Simulators Real Time Atmosphere Rendering for the Space Simulators Radovan Josth Department of Computer Graphics and Multimedia FIT BUT Faculty of Information Technology Brno University of Technology Brno / Czech

More information

Programming Tips For Scalable Graphics Performance

Programming Tips For Scalable Graphics Performance Game Developers Conference 2009 Programming Tips For Scalable Graphics Performance March 25, 2009 ROOM 2010 Luis Gimenez Graphics Architect Ganesh Kumar Application Engineer Katen Shah Graphics Architect

More information

Advanced d Computer Graphics CS 563: Real Time Ocean Rendering

Advanced d Computer Graphics CS 563: Real Time Ocean Rendering Advanced d Computer Graphics CS 563: Real Time Ocean Rendering [Real Time Realistic Ocean Lighting using Seamless Transitions from Geometry to BRDF] Xin Wang March, 20, 2012 Computer Science Dept. Worcester

More information

Lesson 03: We will add water and will set the placing conditions for the material. WorldBuilder 3.5. for. About Digital Element Tutorials:

Lesson 03: We will add water and will set the placing conditions for the material. WorldBuilder 3.5. for. About Digital Element Tutorials: Lesson 03: We will add water and will set the placing conditions for the material for WorldBuilder 3.5 About Digital Element Tutorials: This tutorial is available both in.pdf format and in Qarbon format,

More information

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved Avi Shapira Graphic Remedy Copyright Khronos Group, 2009 - Page 1 2004 2009 Graphic Remedy. All Rights Reserved Debugging and profiling 3D applications are both hard and time consuming tasks Companies

More information

frame buffer depth buffer stencil buffer

frame buffer depth buffer stencil buffer Final Project Proposals Programmable GPUS You should all have received an email with feedback Just about everyone was told: Test cases weren t detailed enough Project was possibly too big Motivation could

More information

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017 Getting fancy with texture mapping (Part 2) CS559 Spring 2017 6 Apr 2017 Review Skyboxes as backdrops Credits : Flipmode 3D Review Reflection maps Credits : NVidia Review Decal textures Credits : andreucabre.com

More information

Raise your VR game with NVIDIA GeForce Tools

Raise your VR game with NVIDIA GeForce Tools Raise your VR game with NVIDIA GeForce Tools Yan An Graphics Tools QA Manager 1 Introduction & tour of Nsight Analyze a geometry corruption bug VR debugging AGENDA System Analysis Tracing GPU Range Profiling

More information

CS 465 Program 5: Ray II

CS 465 Program 5: Ray II CS 465 Program 5: Ray II out: Friday 2 November 2007 due: Saturday 1 December 2007 Sunday 2 December 2007 midnight 1 Introduction In the first ray tracing assignment you built a simple ray tracer that

More information

Vulkan Multipass mobile deferred done right

Vulkan Multipass mobile deferred done right Vulkan Multipass mobile deferred done right Hans-Kristian Arntzen Marius Bjørge Khronos 5 / 25 / 2017 Content What is multipass? What multipass allows... A driver to do versus MRT Developers to do Transient

More information

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST CS 380 - GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1 Markus Hadwiger, KAUST Reading Assignment #2 (until Feb. 17) Read (required): GLSL book, chapter 4 (The OpenGL Programmable

More information

Interactive Cloth Simulation. Matthias Wloka NVIDIA Corporation

Interactive Cloth Simulation. Matthias Wloka NVIDIA Corporation Interactive Cloth Simulation Matthias Wloka NVIDIA Corporation MWloka@nvidia.com Overview Higher-order surfaces Vertex-shader deformations Lighting modes Per-vertex diffuse Per-pixel diffuse with bump-map

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Advanced Maya e Texturing. and Lighting. Second Edition WILEY PUBLISHING, INC.

Advanced Maya e Texturing. and Lighting. Second Edition WILEY PUBLISHING, INC. Advanced Maya e Texturing and Lighting Second Edition Lee Lanier WILEY PUBLISHING, INC. Contents Introduction xvi Chapter 1 Understanding Lighting, Color, and Composition 1 Understanding the Art of Lighting

More information

Optimisation. CS7GV3 Real-time Rendering

Optimisation. CS7GV3 Real-time Rendering Optimisation CS7GV3 Real-time Rendering Introduction Talk about lower-level optimization Higher-level optimization is better algorithms Example: not using a spatial data structure vs. using one After that

More information

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology

Point Cloud Filtering using Ray Casting by Eric Jensen 2012 The Basic Methodology Point Cloud Filtering using Ray Casting by Eric Jensen 01 The Basic Methodology Ray tracing in standard graphics study is a method of following the path of a photon from the light source to the camera,

More information

Realistic and Fast Cloud Rendering in Computer Games. Niniane Wang Software Engineer Microsoft Flight Simulator (now at Google Inc) Intro Video

Realistic and Fast Cloud Rendering in Computer Games. Niniane Wang Software Engineer Microsoft Flight Simulator (now at Google Inc) Intro Video Realistic and Fast Cloud Rendering in Computer Games Niniane Wang Software Engineer Microsoft Flight Simulator (now at Google Inc) Intro Video 1 Agenda Previous Work 3-D Modeling + Art Pipeline Performance

More information

Real-time Graphics 6. Reflections, Refractions

Real-time Graphics 6. Reflections, Refractions 6. Reflections, Refractions Blending Simulating transparent materials Alpha value, RGBA model Using alpha test fragment alpha value is tested against given constant Using blending of colors fragment color

More information

Evolution of GPUs Chris Seitz

Evolution of GPUs Chris Seitz Evolution of GPUs Chris Seitz Overview Concepts: Real-time rendering Hardware graphics pipeline Evolution of the PC hardware graphics pipeline: 1995-1998: Texture mapping and z-buffer 1998: Multitexturing

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

Hardware Displacement Mapping

Hardware Displacement Mapping Matrox's revolutionary new surface generation technology, (HDM), equates a giant leap in the pursuit of 3D realism. Matrox is the first to develop a hardware implementation of displacement mapping and

More information

Chapter 6- Lighting and Cameras

Chapter 6- Lighting and Cameras Lighting Types and Settings When you create a scene in Blender, you start with a few basic elements that will include a camera, but may or may not include a light. Remember that what the camera sees is

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Fall 2015 Lecture 9 October 1, 2015 What I was going to say last time How are the ideas we ve learned about implemented in hardware so they are fast. Important:

More information

Advanced Distant Light for DAZ Studio

Advanced Distant Light for DAZ Studio Contents Advanced Distant Light for DAZ Studio Introduction Important Concepts Quick Start Quick Tips Parameter Settings Light Group Shadow Group Lighting Control Group Known Issues Introduction The Advanced

More information

Direct Rendering of Trimmed NURBS Surfaces

Direct Rendering of Trimmed NURBS Surfaces Direct Rendering of Trimmed NURBS Surfaces Hardware Graphics Pipeline 2/ 81 Hardware Graphics Pipeline GPU Video Memory CPU Vertex Processor Raster Unit Fragment Processor Render Target Screen Extended

More information

Zeyang Li Carnegie Mellon University

Zeyang Li Carnegie Mellon University Zeyang Li Carnegie Mellon University Recap: Texture Mapping Programmable Graphics Pipeline Bump Mapping Displacement Mapping Environment Mapping GLSL Overview Perlin Noise GPGPU Map reflectance over a

More information

03 RENDERING PART TWO

03 RENDERING PART TWO 03 RENDERING PART TWO WHAT WE HAVE SO FAR: GEOMETRY AFTER TRANSFORMATION AND SOME BASIC CLIPPING / CULLING TEXTURES AND MAPPING MATERIAL VISUALLY DISTINGUISHES 2 OBJECTS WITH IDENTICAL GEOMETRY FOR NOW,

More information

CS232: Computer Architecture II

CS232: Computer Architecture II CS232: Computer Architecture II Spring 23 January 22, 23 21-23 Howard Huang 1 What is computer architecture about? Computer architecture is the study of building entire computer systems. Processor Memory

More information

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions. Appendix A Chapter Answers This appendix provides answers to all of the book s chapter review questions. Chapter 1 1. What was the original name for the first version of DirectX? B. Games SDK 2. Which

More information

1 Hardware virtualization for shading languages Group Technical Proposal

1 Hardware virtualization for shading languages Group Technical Proposal 1 Hardware virtualization for shading languages Group Technical Proposal Executive Summary The fast processing speed and large memory bandwidth of the modern graphics processing unit (GPU) will make it

More information

I ll be speaking today about BC6H compression, how it works, how to compress it in realtime and what s the motivation behind doing that.

I ll be speaking today about BC6H compression, how it works, how to compress it in realtime and what s the motivation behind doing that. Hi everyone, my name is Krzysztof Narkowicz. I m the Lead Engine Programmer at Flying Wild Hog. It s a small company based in Poland and we are mainly making some old-school first person shooter games

More information

Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control

Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control Chapter 10 Computation Culling with Explicit Early-Z and Dynamic Flow Control Pedro V. Sander ATI Research John R. Isidoro ATI Research Jason L. Mitchell ATI Research Introduction In last year s course,

More information

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU for 3D Texture-based Volume Visualization on GPU Won-Jong Lee, Tack-Don Han Media System Laboratory (http://msl.yonsei.ac.k) Dept. of Computer Science, Yonsei University, Seoul, Korea Contents Background

More information

VISUAL QUALITY ASSESSMENT CHALLENGES FOR ARCHITECTURE DESIGN EXPLORATIONS. Wen-Fu Kao and Durgaprasad Bilagi. Intel Corporation Folsom, CA 95630

VISUAL QUALITY ASSESSMENT CHALLENGES FOR ARCHITECTURE DESIGN EXPLORATIONS. Wen-Fu Kao and Durgaprasad Bilagi. Intel Corporation Folsom, CA 95630 Proceedings of Seventh International Workshop on Video Processing and Quality Metrics for Consumer Electronics January 30-February 1, 2013, Scottsdale, Arizona VISUAL QUALITY ASSESSMENT CHALLENGES FOR

More information

Spring 2009 Prof. Hyesoon Kim

Spring 2009 Prof. Hyesoon Kim Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs Saving the Planet Designing Low-Power, Low-Bandwidth GPUs Alan Tsai Business Development Manager ARM Saving the Planet? Really? Photo courtesy of NASA. 2 Mobile GPU design is all about power It s not about

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on

More information

How much data can a BluRay hold?

How much data can a BluRay hold? COMPUTER HARDWARE ICS2O MR. EMMELL HOW MUCH SPACE ON YOUR USB? How much RAM in your phone? How much data can a BluRay hold? 1 THAT WHOLE B/KB/MB/GB/TB THING THAT WHOLE B/KB/MB/GB/TB THING So how many Bytes

More information

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir

GPGPU Applications. for Hydrological and Atmospheric Simulations. and Visualizations on the Web. Ibrahim Demir GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web Ibrahim Demir Big Data We are collecting and generating data on a petabyte scale (1Pb = 1,000 Tb = 1M Gb) Data

More information

Planets Earth, Mars and Moon Shaders Asset V Documentation (Unity 5 version)

Planets Earth, Mars and Moon Shaders Asset V Documentation (Unity 5 version) Planets Earth, Mars and Moon Shaders Asset V0.4.4 Documentation (Unity 5 version) Charles Pérois - 2015 Introduction 2 Table des matières 1. Introduction...3 2. Release Notes...4 3. How to Use...6 1. Set

More information

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker Rendering Algorithms: Real-time indirect illumination Spring 2010 Matthias Zwicker Today Real-time indirect illumination Ray tracing vs. Rasterization Screen space techniques Visibility & shadows Instant

More information

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express

Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express Shader Series Primer: Fundamentals of the Programmable Pipeline in XNA Game Studio Express Level: Intermediate Area: Graphics Programming Summary This document is an introduction to the series of samples,

More information

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Architectures Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1 Overview of today s lecture The idea is to cover some of the existing graphics

More information

The Shadow Rendering Technique Based on Local Cubemaps

The Shadow Rendering Technique Based on Local Cubemaps The Shadow Rendering Technique Based on Local Cubemaps Content 1. Importing the project package from the Asset Store 2. Building the project for Android platform 3. How does it work? 4. Runtime shadows

More information

Jomar Silva Technical Evangelist

Jomar Silva Technical Evangelist Jomar Silva Technical Evangelist Agenda Introduction Intel Graphics Performance Analyzers: what is it, where do I get it, and how do I use it? Intel GPA with VR What devices can I use Intel GPA with and

More information

Profiling and Debugging Games on Mobile Platforms

Profiling and Debugging Games on Mobile Platforms Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5

More information

Welcome to Part 3: Memory Systems and I/O

Welcome to Part 3: Memory Systems and I/O Welcome to Part 3: Memory Systems and I/O We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? We will now focus on memory issues, which are frequently

More information

Real-time water rendering Introducing the projected grid concept

Real-time water rendering Introducing the projected grid concept Real-time water rendering Introducing the projected grid concept Master of Science thesis Claes Johanson March 2004 Lund University Supervisor: Calle Lejdfors Department of Computer

More information

Drawing Fast The Graphics Pipeline

Drawing Fast The Graphics Pipeline Drawing Fast The Graphics Pipeline CS559 Spring 2016 Lecture 10 February 25, 2016 1. Put a 3D primitive in the World Modeling Get triangles 2. Figure out what color it should be Do ligh/ng 3. Position

More information

Problem Set 4 Part 1 CMSC 427 Distributed: Thursday, November 1, 2007 Due: Tuesday, November 20, 2007

Problem Set 4 Part 1 CMSC 427 Distributed: Thursday, November 1, 2007 Due: Tuesday, November 20, 2007 Problem Set 4 Part 1 CMSC 427 Distributed: Thursday, November 1, 2007 Due: Tuesday, November 20, 2007 Programming For this assignment you will write a simple ray tracer. It will be written in C++ without

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Mobile 3D Devices. -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP

Mobile 3D Devices. -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP Mobile 3D Devices -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP Who is this guy? Involved with simulation and games since 1995 Worked on SIMNET

More information

ECE 574 Cluster Computing Lecture 16

ECE 574 Cluster Computing Lecture 16 ECE 574 Cluster Computing Lecture 16 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 26 March 2019 Announcements HW#7 posted HW#6 and HW#5 returned Don t forget project topics

More information

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay! Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1

More information