Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Similar documents
Achieving Console Quality Games on Mobile

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Profiling and Debugging Games on Mobile Platforms

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

PowerVR Performance Recommendations. The Golden Rules

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

Optimizing Mobile Games with Gameloft and ARM

PowerVR Hardware. Architecture Overview for Developers

ARM Multimedia IP: working together to drive down system power and bandwidth

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Mobile HW and Bandwidth

Vulkan Multipass mobile deferred done right

Working with Metal Overview

PowerVR Series5. Architecture Guide for Developers

Unity Software (Shanghai) Co. Ltd.

Table of Contents. Questions or problems?

PowerVR Performance Recommendations. The Golden Rules

GeForce4. John Montrym Henry Moreton

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

PowerVR Performance Recommendations The Golden Rules. October 2015

Performance Analysis and Culling Algorithms

Graphics Processing Unit Architecture (GPU Arch)

Render-To-Texture Caching. D. Sim Dietrich Jr.

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

LOD and Occlusion Christian Miller CS Fall 2011

Easy Decal Version Easy Decal. Operation Manual. &u - Assets

Programming Graphics Hardware

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

Coding OpenGL ES 3.0 for Better Graphics Quality

Rendering Objects. Need to transform all geometry then

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

CS GAME PROGRAMMING Question bank

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

Developing the Bifrost GPU architecture for mainstream graphics

Deferred Rendering Due: Wednesday November 15 at 10pm

Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group

Course Recap + 3D Graphics on Mobile GPUs

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Tips and Tricks to Get the Most out of Your Virtual-Reality Experiences in Stingray

Lecture 6: Texture. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Achieving High-performance Graphics on Mobile With the Vulkan API

Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Rasterization Overview

Mali Demos: Behind the Pixels. Stacy Smith

Evolution of GPUs Chris Seitz

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Getting fancy with texture mapping (Part 2) CS559 Spring Apr 2017

3D Rendering Pipeline

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

8iUnityPlugin Documentation

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

POWERVR MBX. Technology Overview

Bifrost - The GPU architecture for next five billion

Optimizing Mobile Games with ARM. Solo Chang Staff Applications Engineer, ARM

PowerVR. Performance Recommendations

DOOM 3 : The guts of a rendering engine

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

Rendering Grass with Instancing in DirectX* 10

Inside VR on Mobile. Sam Martin Graphics Architect GDC 2016

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Game Programming Lab 25th April 2016 Team 7: Luca Ardüser, Benjamin Bürgisser, Rastislav Starkov

The Application Stage. The Game Loop, Resource Management and Renderer Design

POWERVR MBX & SGX OpenVG Support and Resources

Windowing System on a 3D Pipeline. February 2005

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Streaming Massive Environments From Zero to 200MPH

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

Mali-G72 Enabling tomorrow s technology today

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

Pipeline Operations. CS 4620 Lecture Steve Marschner. Cornell CS4620 Spring 2018 Lecture 11

Enhancing Traditional Rasterization Graphics with Ray Tracing. March 2015

Optimized Rendering Techniques Based on Local Cubemaps

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

The Rasterization Pipeline

PowerVR: Getting Great Graphics Performance with the PowerVR Insider SDK. PowerVR Developer Technology

NVIDIA Developer Toolkit. March 2005

Adding Advanced Shader Features and Handling Fragmentation

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Mali-G72: Enabling tomorrow s technology today

The Making of Seemore WebGL. Will Eastcott, CEO, PlayCanvas

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Pipeline Operations. CS 4620 Lecture 14

CS4620/5620: Lecture 14 Pipeline

Enabling immersive gaming experiences Intro to Ray Tracing

Drawing Fast The Graphics Pipeline

CS 354R: Computer Game Technology

Real-World Applications of Computer Arithmetic

Real-Time Universal Capture Facial Animation with GPU Skin Rendering

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

Monday Morning. Graphics Hardware

Performance OpenGL Programming (for whatever reason)

Transcription:

Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1

Agenda Introduction ARM and the presenter Preliminary knowledge Unity Pro Identify the bottleneck CPU Vertex processing Fragment processing Bandwidth The Unity Profiler (Pro only) Example application Overview Scenes breakdown 2

3 Introduction

ARM: Connecting the World What does ARM do? The world's leading semiconductor intellectual property (IP) supplier Powering your consumer electronics since 1993 Over 95% of mobile phones and tablets Over 70% of smart TVs Over 95% of portable gaming consoles Over 80% of digital cameras Who am I and what do I do? Angelo Theodorou Amiga and demoscene lover 3D graphics enthusiast and former game developer Senior Software Engineer in the ARM Mali Ecosystem team At the service of renowned game studios around the world 4

Introduction Preliminary knowledge What is a draw call? A call to a function of the underlying API (e.g. OpenGL ES) to draw something on screen What is a fragment? A candidate pixel which may or may not end up on screen for different reasons (e.g. being discarded, being overwritten, etc.) Differences between opaque and transparent render queue Objects in the first queue are rendered in front to back order to make use of depth testing and minimize the overdraw, transparent ones are rendered afterwards in back to front order to preserve correctness What does batching mean? To group similar draw calls in one call operating on the whole data set Why are mobile platforms different? Immediate vs deferred rendering 5

Introduction Why are mobile platforms different? Desktop platforms Immediate mode: graphics commands are executed when issued Huge amount of bandwidth available between GPU and dedicated video memory (>100 GB/s) No strict limits in size, power consumption or heat generation Mobile platforms Deferred mode: graphics commands are collected by the driver and issued later Tile based: rendering occurs in a small on-chip buffer before being written to memory Bandwidth is severely reduced (~5 GB/s) and transferring data needs a great amount of power Memory is unified and shared between CPU and GPU 6

Introduction Unity Pro Pro only optimization features: Profiler Level of detail Occlusion culling Light probes Static batching Stencil buffer access in shaders Native code plugins support 7

8 Identify the bottleneck

Identify the Bottleneck CPU Too many draw calls Complex scripts or physics Vertex processing Too many vertices Too much computation per vertex Fragment processing Too many fragments and overdraw Too much computation per fragment Bandwidth Big and uncompressed textures High resolution framebuffer 9

Identify the Bottleneck CPU Bound Too many draw calls Dynamic Batching (automatic) Static Batching (Pro only) Frustum Culling (automatic) Occlusion culling (Pro only) Complex scripts Component caching Pool of objects Reduced GUI calls OnBecomeVisible() / OnBecomeInvisible() Timed coroutines instead of per frame updates Complex physics Compound colliders instead of mesh ones Increased duration of Fixed Timestep in the TimeManager 10

Identify the Bottleneck Vertex Bound Too many vertices in geometry Remove unnecessary vertices Remove unnecessary hard edges and UV seams Use LOD switching through LODGroup (Pro only) Frustum culling (automatic) Occlusion culling (Pro only) Too much computation per vertex Use the mobile version of Unity shaders whenever you can 13

Identify the Bottleneck Vertex Bound level of detail Use LODGroup to limit vertex count when geometric detail is not strictly needed (very far or small objects) Geometric objects can be replaced by billboards at long distance 14

Identify the Bottleneck Fragment Bound Overdraw (drawing to each pixel on screen more than once) Drawing your objects from front to back reduces overdraw, thanks to depth testing Limit the amount of transparency in the scene (beware of particles!) Unity has a Render Mode to show the amount of overdraw per pixel Too much computation per fragment Bake as much as you can (lightmaps, light probes, etc.) Contain the number of per-pixel lights Limit real-time shadows (only high-end mobile devices) Try to avoid full screen post-processing Use the mobile version of Unity shaders whenever you can Custom shaders Avoid expensive math functions, use texture lookup, pick the lowest possible precision format 15

Identify the Bottleneck Fragment Bound - overdraw Use the dedicated Render Mode to check the overdraw amount 16

Identify the Bottleneck Bandwidth Bound Use texture compression ETC1 with OpenGL ES 2.0; ETC2 and EAC with OpenGL ES 3.0 ASTC is the latest Khronos standard, supported since Unity 4.3 Use MIP maps More memory but improved image quality (less aliasing artefacts) Optimized memory bandwidth (when sampling from smaller maps) Use 16-bit textures where you can cope with reduced detail Use trilinear and anisotropic filtering in moderation 19

Identify the Bottleneck Bandwidth Bound mipmaps Use the dedicated Render Mode to check mipmap levels 20

21 The Unity Profiler

Unity Profiler Overview Instruments the code to provide detailed per-frame performance data: CPU usage Rendering Memory GPU usage Audio Physics Can profile content running on mobile devices Only available in Unity Pro 22

Unity Profiler CPU Usage Shows CPU utilization for rendering, scripts, physics, garbage collection, etc. Shows detailed info about how the time was spent Enable Deep Profile for additional data about all the function calls occurring in your scripts Manually instrument specific blocks of code with Profiler.BeginSample()and Profiler.EndSample() 23

Unity Profiler Rendering Statistics about the rendering subsystem: Draw calls Triangles Vertices The lower pane shows data similar to the rendering statistics window of the editor 24

Unity Profiler Memory Shows memory used/reserved on a higher level: Unity native allocations Garbage collected Mono managed allocations Graphics and audio total used memory estimation Memory used by the profiler data itself Shows memory used by assets/objects: Textures Meshes Materials Animation clips Audio clips New detailed breakdown since Unity 4.1 25

Unity Profiler GPU Usage Shows a contribution breakdown similar to the CPU profiler: Rendering of opaque objects Rendering of transparent objects Shadows Deferred shading passes Post processing Not yet available on mobile platforms 26

29 Example Application

Example Application Overview Composed of seven different scenes: Geometry (high vertex count, frustum culling) LOD (using level of detail through LODGroup) Particles (transparent rendering) Texture (MIP mapping and compression) Overdraw (alpha blending and alpha test) Lights (vertex and pixel lighting) Atlas (texture atlas) Most of them are based on a Prefab and Instantiate() All instances are created at once in the beginning and deactivated Number of active instances can adapt to maintain a target FPS value Positioning is made in screen space and based on camera aspect 30

31

32 Camera translated and rotated

Geometry Scene Stressing the vertex units with lots of high poly objects You can set the amount of visible objects or a target FPS value More time spent on rendering (green) Increasing the number of visible objects It is possible to rotate and translate the camera to show frustum culling in action (and lack of occlusion culling) Most of the objects are out of camera view, frustum culling is working Most of the objects are hidden behind front ones, but occlusion culling is not enabled 33

34 LOD 2

35 LOD 0

LOD Scene LODGroup with three different levels Triangles and vertices spikes match LOD level switch, draw calls are dependent on frustum culling Switching LOD 36

37

Particles Scene All the rendering is done in the Transparent Queue Geometry Scene Particles Scene Perfect case of geometry batching (one draw call per emitter) 38

Texture Scene Compressed texture with MIP maps (right) takes less memory and looks better than uncompressed one (left) Uncompressed, bilinear filtering ETC compressed, trilinear filtering, MIP mapped 39

Transparent Vertex-Lit shader Transparent Cutout Vertex-Lit shader 40

Overdraw Scene Alpha blended quads are rendered in the transparent queue, alpha tested ones in the opaque queue, similar performance on ARM Mali technology Overdraw render mode 41

Vertex-Lit shader Diffuse shader 42

Lights Scene (1/2) Per-pixel lighting uses a draw call per light (forward rendering) 4 lights, adding more objects 4x250 lights, 1198 draw calls 3x250 lights, 966 draw calls 2x250 lights, 734 draw calls 250 lights, 502 draw calls No lights, 39 draw calls (232 batched) Per-vertex lighting uses one geometric pass per object, lighting information is a vertex attribute Per-pixel: 4x250 lights, 1198 draw calls Per-vertex: 4x250 lights, 270 draw calls 43

Lights Scene (2/2) MeshRenderer.Render takes 5 times more milliseconds when performing per-pixel lighting (one pass per light) Per-pixel lighting Per-vertex lighting 44

45

Atlas Scene (1/2) No Atlas rendering Each instance has its own texture Same material if multiple instances share the same texture One draw call per material (multiple instances) Atlas rendering changing material UV Each instance shares the same texture atlas UV coordinates are changed using maintexturescale and maintextureoffset Leads to a new, unique material per instance and breaks instancing One draw call per instance Atlas rendering changing mesh UV Each instance shares the same texture atlas UV coordinates are changed accessing the instance mesh One draw call for all the instances (single material) 46

Atlas Scene (2/2) 47 No Atlas texture 60 FPS 50 draw calls, 374 batched 21 materials Atlas with material UV 40 FPS 408 draw calls, 0 batched 395 materials No batching, lots of materials! Atlas with mesh UV 60 FPS 35 draw calls, 374 batched 21 materials Batching works, less draw calls

Links Practical Guide to Optimization for Mobiles (Unity Manual) Optimizing Graphics Performance (Unity Manual) Profiler (Unity Manual) Profiler Scripting Reference (Unity Manual) Measuring Performance with the Built-in Profiler (Unity Manual) Introducing the new Memory Profiler (Unity Blog) Occlusion Culling in Unity 4.3 (Unity Blog) Mali Developer Center Mali Graphics Debugger DS-5 Development Studio Streamline 48

Thank You The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners 49

Mali Developer Center A full range of resources on one, easy-access portal: Tools SDKs Drivers Tutorials & Development Guides Sample Code http://malideveloper.arm.com 50