MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

Similar documents
Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Deferred Rendering Due: Wednesday November 15 at 10pm

Case Study: Optimizing King of Soldier* with Intel Graphics Performance Analyzers on Intel HD Graphics 4000

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Rendering Grass with Instancing in DirectX* 10

Render-To-Texture Caching. D. Sim Dietrich Jr.

There are two lights in the scene: one infinite (directional) light, and one spotlight casting from the lighthouse.

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

LOD and Occlusion Christian Miller CS Fall 2011

DOOM 3 : The guts of a rendering engine

The Application Stage. The Game Loop, Resource Management and Renderer Design

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

Computer Graphics (CS 543) Lecture 10: Normal Maps, Parametrization, Tone Mapping

Dynamic Resolution Rendering

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Evolution of GPUs Chris Seitz

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

NVIDIA Tools for Artists

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

A bit more Deferred - CryEngine 3. Triangle Game Conference 2009 Martin Mittring Lead Graphics Programmer

Rendering Objects. Need to transform all geometry then

TSBK03 Screen-Space Ambient Occlusion

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

CS 354R: Computer Game Technology

DX10, Batching, and Performance Considerations. Bryan Dudash NVIDIA Developer Technology

Direct3D API Issues: Instancing and Floating-point Specials. Cem Cebenoyan NVIDIA Corporation

Game Programming Lab 25th April 2016 Team 7: Luca Ardüser, Benjamin Bürgisser, Rastislav Starkov

Deus Ex is in the Details

Vulkan Multipass mobile deferred done right

Software Occlusion Culling

Per-Pixel Lighting and Bump Mapping with the NVIDIA Shading Rasterizer

Shadows for Many Lights sounds like it might mean something, but In fact it can mean very different things, that require very different solutions.

CocoVR - Spherical Multiprojection

PowerVR Hardware. Architecture Overview for Developers

NVIDIA Developer Toolkit. March 2005

CMSC427 Advanced shading getting global illumination by local methods. Credit: slides Prof. Zwicker

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

Increase your FPS. with CPU Onload Josh Doss. Doug McNabb.

Engine Development & Support Team Lead for Korea UE4 Mobile Team Lead

Soft Particles. Tristan Lorach

Bringing Hollywood to Real Time. Abe Wiley 3D Artist 3-D Application Research Group

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Game Architecture. 2/19/16: Rasterization

Object Space Lighting. Dan Baker Founder, Oxide Games

Pump Up Your Pipeline

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Module Contact: Dr Stephen Laycock, CMP Copyright of the University of East Anglia Version 1

Opaque. Flowmap Generator 3

Increase your FPS with CPU Onload

Snow Shader R&D in UDK I3 DLC

3D Rendering Pipeline

Nonphotorealism. Christian Miller CS Fall 2011

Wednesday, July 24, 13

Real-Time Universal Capture Facial Animation with GPU Skin Rendering

Applications of Explicit Early-Z Culling

Com S 336 Final Project Ideas

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

The Graphics Pipeline

Canonical Shaders for Optimal Performance. Sébastien Dominé Manager of Developer Technology Tools

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

3D Authoring Tool BS Content Studio supports Deferred Rendering for improved visual quality

Black Desert Online. Taking MMO Development to the Next Level. Dongwook Ha Gwanghyeon Go

Streaming Massive Environments From Zero to 200MPH

Textures. Texture coordinates. Introduce one more component to geometry

Optimal Shaders Using High-Level Languages

Advanced 3D Game Programming with DirectX* 10.0

Performance OpenGL Programming (for whatever reason)

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Drawing Fast The Graphics Pipeline

Computer Graphics (CS 543) Lecture 10: Soft Shadows (Maps and Volumes), Normal and Bump Mapping

Mali Demos: Behind the Pixels. Stacy Smith

Craig Peeper Software Architect Windows Graphics & Gaming Technologies Microsoft Corporation

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Technical Report. Mesh Instancing

Programming Graphics Hardware

Shaders in Eve Online Páll Ragnar Pálsson

Realistic and Fast Cloud Rendering in Computer Games. Niniane Wang Software Engineer Microsoft Flight Simulator (now at Google Inc) Intro Video

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

Shaders. Slide credit to Prof. Zwicker

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

CS451Real-time Rendering Pipeline

Michal Valient Lead Tech Guerrilla Games

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Advanced Deferred Rendering Techniques. NCCA, Thesis Portfolio Peter Smith

BitSquid Tech Benefits of a data-driven renderer. Tobias Persson GDC 2011

AGDC Per-Pixel Shading. Sim Dietrich

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

PowerVR Performance Recommendations. The Golden Rules

Last Time. Why are Shadows Important? Today. Graphics Pipeline. Clipping. Rasterization. Why are Shadows Important?

Lecturer Athanasios Nikolaidis

Transcription:

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer A New Gaming Experience Made Possible With Processor Graphics Released in early 2011, the 2nd Generation Intel Core processors (codenamed Sandy Bridge ) have fundamentally changed the PC gaming landscape, with processor and graphics merged into a single piece of silicon. The tighter integration of graphics into the processor has revolutionized gaming performance and made your favorite games run and look great on mainstream machines. As a developer, the market for your games has opened up immensely. Targeting the mainstream is possible and Intel provides you the tools to help achieve success in this market. Intel Graphics Performance Analyzers (GPA) 4.0 was released at GDC 2011 Figure 1: PC Volumes Dwarf Consoles through 2014 with full support for this next generation of processors. Game developers including the Darkspore team at Maxis have taken full advantage of Intel GPA 4.0 features to make sure their game fully utilizes the new processor graphics. 1

Darkspore is Designed for Everyone Darkspore is best described as an online action RPG where you control a squad of 3 heroes to fight the Darkspore, an evil infestation of creatures. The game includes a version of the award winning Spore Editor where you can fully customize not just basic features of the characters like color, height, textures, but vertices in the characters meshes. With Darkspore, Maxis is targeting mainstream to high end graphics hardware. At the Game Developers Conference 2011, David Lee Swenson, the Lead Engineer on Darkspore, presented how he used Intel GPA 4.0 to help achieve this goal. Darkspore Rendering For Darkspore, Maxis implemented a variation of the light pre-pass renderer described by [Engel 2008]. One main advantage of using a light pre-pass renderer is to decouple lighting from the scene geometry. In a forward renderer, lighting is computed at the time the object is drawn. In a light pre-pass renderer, light is accumulated in an off screen buffer and either applied in a post pass or sampled in a final pass per object. The Darkspore renderer is composed of three main passes: deferred pass, lighting pass, and final pass. The deferred pass saves the material data for opaque objects in the scene, the light pass saves all lighting calculations into a light buffer, and the final pass uses the results of the previous passes to create the final frame. The deferred pass uses 2 render targets. The first render target is composed of world normals with a gloss term: Normals (RGB) + Gloss (A). Figure 2: First render target = world space normals (RGB) + gloss (A) Depth ([R*256]G), Specular Power (B), and a Toon ID (A) for a toon-style character outline are rendered to the second target of the deferred pass. Figure 3: Second render target = Depth ([R*256]G) + Specular Power (B) + ToonID (A) Following the deferred pass, there s a lighting pass that renders up to 6 parallel lights as well as cloud shadows and the main shadow. Then, each point or spot light are rendered with gobos and/or shadows to a 16F light buffer: Diffuse (RGB) + Specular (A). 2

Figure 4: Light buffer = Diffuse (RGB) + Specular (A) During the final pass, each object is rendered again sampling the light buffer. Values for areas intended to glow are written to a second target. Then, the glow target is downsampled and blurred back together with particles and post-processing effects like fog, distortion, and the death effect. Figure 5: Final pass = Color + Glow + Post FX + Particles + UI At this point all it needs is the UI and the final frame is complete. Darkspore and Intel GPA Figure 6: The final frame Before Intel GPA, the Darkspore team at Maxis was using a long list of tests and settings to understand and debug game performance. With Intel GPA, the game can be run with the applicable command line options. Once a frame is captured, it s just a matter of setting the X and Y axis to the appropriate metrics to get a useful profile of performance. 3

Figure 7: Darkspore with command line options Since Darkspore was known to be pixel shader bound, the Darkspore team typically would set the X axis to the PS Duration metric and the Y axis to GPU Duration. With these metrics, the taller the bar, the more GPU time it is taking and the wider the bar the more pixel shader time. Figure 8: Initial performance graph with four expensive calls The four calls labeled above (A, B, C, D) are the most expensive in this frame capture. Calls A and C correspond to the deferred and final pass for the blood decals. Call B corresponds to the parallel lights, cloud shadows, and shadows. Call D runs the edge detection that was originally a bigger part of the game but has since been moved to the high spec configuration. Looking at the shaders for the blood decal calls (A and C), a few issues were found and fixed by the Darkspore team: Tiling of decals was supported but never used Vectors were normalized that were used only for a cubemap lookup A Fresnel term was adding very little to the scene, given the fixed camera angle Alpha test was implemented with both a clip instruction and blending Normal calculation was overly complex with values that could be moved to the vertex shader 4

After optimizing the decal shader and moving character shading to the high spec, re-profiling Darkspore only shows Call B that corresponds to parallel lights and shadows. Given the amount of work done in Call B, it is expected to be expensive but grouping all these tasks into one call amortizes the cost of picking up the normal and depth values and recovering the position. Hot Loading Shaders in Frame Analyzer The Darkspore team used Frame Analyzer to verify the changes made to the decal shaders had the positive impact they expected as shown above. But, they also took advantage of the hot loading shaders feature of Frame Analyzer to test changes on the fly. Frame Analyzer allows for replacement of shaders in HLSL or shader assembly for selected calls. By hot loading shaders in Frame Analyzer, you can immediately see the performance difference of the changes without having to capture another frame and hopefully recreate enough of the same events in the scene to make it comparable. Using this hot loading functionality, the Darkspore team was able to rapidly test changes made to the decal shaders. After loading the modified decal pixel shader for calls A and C, the deferred and final pass draw calls, Frame Analyzer showed a 30% and 24% improvement respectively for these calls. Figure 9: Re-profiling Darkspore after optimizations to the decal shader They Drew First (Deferred) Blood Looking further at these two calls that draw the blood decals in Frame Analyzer, we can see their volumes if we set the render state to wireframe.the blood decals are effectively writing all of the pixels in the volume because there is no test in place to early out in the pixel pipeline. The stencil buffer can be used to kill pixels in the final pass. Figure 10: 30% improvement on Call A with modified decal shader Figure 11: Blood decals writing to too many pixels 5

After making the change to the final pass and doing a simple frame rate check at the beginning of the level, there was a noticeable frame rate improvement of about 2 FPS. In Frame Analyzer, this new stencil write test can be set on the decal calls to fully understand the effect: 1. Select both calls A and C (deferred and final pass) and setting STENCILENABLE to true. 2. Then for call A, set STENCILFUNC to D3DCMP_ALWAYS, the STENCILPASS to D3DSTENCILOP_REPLACE, and STENCILREF to 2. 3. For call C, set STENCILFUNC to D3DCMP_EQUAL and the STENCILREF to 2. Figure 12: Blood decals now being stenciled correctly Other Optimizations for Darkspore The Darkspore team made several miscellaneous optimizations to the trees, terrain, character detail, and render system. The forest levels in the game had lots of trees. Maxis found all the trees models had roots below the ground that were being rendered, adding unnecessary polygons that no one ever saw. These were promptly removed saving processing time. Figure 13: Dense tree root geometry was drawn but not visible 6

In Darkspore, the terrain mixes 4 textures together per pass. The artists have control of which textures are mixed per vertex. Large sections were found to only need one texture instead of mixing four. These triangles that had only one texture were instead rendered with a simpler material. This was a big win in some cases and smaller in others, but always a win. Figure 14: Not all landscape triangles require blended textures With the Spore Editor, the player can customize their squad creatures with countless combinations of parts and equipment. This results in some very detailed but also very polygon heavy creatures. The wireframe of the high LOD (level of detail) for a player creature shows the high polygon density of the creatures models. NPCs were found to be at least this dense. Surprisingly, when these creatures got reduced to the one triangle per pixel level, they were hurting both pixel and vertex shader performance because of the way most graphics parts allocate 4 pixels to a quad and a minimum of one quad per triangle. This resulted in a waste of 3/4ths of the pixel shader performance. Figure 15: Character model with high level of detail 7

As part of the light pre-pass renderer, normals were originally saved in view space. The view space normals only required two channels but weren t worth the cost of pixel shader instructions to recover the normal. Also, a world space normal was needed for reflective/refractive objects which resulted in a transform in the pixel shader, which would be a really bad place for an extra matrix transform. In the end, it was worth an extra channel to keep the normal in world space and avoid adding the extra matrix transform to the pixel shader. Conclusion References Figure 16: World space normal used in reflective/refractive objects Using the various features available in Intel GPA 4.0, the Darkspore team was able to discover and fix bottlenecks in their graphics pipeline. At the end of the day, many of the optimizations made for mainstream graphics improved the overall gaming experience. With these optimizations in place, Darkspore runs at well over 30 FPS on the 2nd Generation Intel Core processors (codenamed Sandy Bridge ). Intel(r) Graphics Performance Analyzers: www.intel.com/software/gpa Engel, Wolfgang. Light Pre-Pass Renderer. March 16, 2008. http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html Authors David Lee Swenson (Electronic Arts) is a 20 year software industry veteran. He s Maxis lead engineer on new rendering architecture and art pipelines for Darkspore. Previously, David was responsible for the Spore environment rendering and terraforming systems. He has also worked on console and PC titles at LucasArts, 3DO, Sierra On-Line, Orion, and MediaFactory. Omar A Rodriguez (Intel Corporation) a graphics software engineer in the Intel Software and Services Group, where he supports Intel graphics solutions in the Visual Computing Software Division. He holds a B.S. in Computer Science from Arizona State University. Omar is not the lead guitarist for the Mars Volta. Intel GPA helps developers utilize the full performance potential of entertainment platforms, including (though not limited to) Intel Core processor-based PCs with Intel HD Graphics and Intel Atom processor-based devices. The 4.1 release supports workloads from the newest Windows graphics API, Microsoft DirectX* 11. Refer to our Optimization Notice for more information regarding performance and optimization choices in Intel software products. Copyright 2011 Intel Corporation. All rights reserved. Intel, the Intel logo, and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA 07/28/11 8