Mobile HW and Bandwidth

Similar documents
PowerVR Hardware. Architecture Overview for Developers

PowerVR Performance Recommendations. The Golden Rules

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

PowerVR Performance Recommendations The Golden Rules. October 2015

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Optimizing DirectX Graphics. Richard Huddy European Developer Relations Manager

PowerVR Performance Recommendations. The Golden Rules

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

PowerVR Series5. Architecture Guide for Developers

Optimizing for DirectX Graphics. Richard Huddy European Developer Relations Manager

Coding OpenGL ES 3.0 for Better Graphics Quality

Multimedia in Mobile Phones. Architectures and Trends Lund

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

CSE 167: Introduction to Computer Graphics Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2015

Hardware-driven visibility culling

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

Profiling and Debugging Games on Mobile Platforms

Achieving Console Quality Games on Mobile

Course Recap + 3D Graphics on Mobile GPUs

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Rendering Grass with Instancing in DirectX* 10

Building scalable 3D applications. Ville Miettinen Hybrid Graphics

POWERVR MBX & SGX OpenVG Support and Resources

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Optimizing Mobile Games with ARM. Solo Chang Staff Applications Engineer, ARM

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

CS427 Multicore Architecture and Parallel Computing

Hidden surface removal. Computer Graphics

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

Optimization Tips 杜博, 美国高通公司资深工程师

Hardware-driven Visibility Culling Jeong Hyun Kim

The Rasterization Pipeline

GeForce3 OpenGL Performance. John Spitzer

CSE 167: Lecture #5: Rasterization. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Mali & OpenGL ES 3.0. Dave Shreiner Jon Kirkham ARM. Game Developers Conference 27 March 2013

Applications of Explicit Early-Z Z Culling. Jason Mitchell ATI Research

ARM Multimedia IP: working together to drive down system power and bandwidth

Starting out with OpenGL ES 3.0. Jon Kirkham, Senior Software Engineer, ARM

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Vulkan Multipass mobile deferred done right

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Chapter IV Fragment Processing and Output Merging. 3D Graphics for Game Programming

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

2: Introducing image synthesis. Some orientation how did we get here? Graphics system architecture Overview of OpenGL / GLU / GLUT

Direct3D 11 Performance Tips & Tricks

Optimizing Mobile Games with Gameloft and ARM

Scheduling the Graphics Pipeline on a GPU

Lecture 6: Texturing Part II: Texture Compression and GPU Latency Hiding Mechanisms. Visual Computing Systems CMU , Fall 2014

Vulkan on Mobile. Daniele Di Donato, ARM GDC 2016

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

POWERVR MBX. Technology Overview

Moving Mobile Graphics Advanced Real-time Shadowing. Marius Bjørge ARM

Module Introduction. Content 15 pages 2 questions. Learning Time 25 minutes

Dominic Filion, Senior Engineer Blizzard Entertainment. Rob McNaughton, Lead Technical Artist Blizzard Entertainment

CS451Real-time Rendering Pipeline

Optimizing Graphics Applications

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

CS4620/5620: Lecture 14 Pipeline

Many rendering scenarios, such as battle scenes or urban environments, require rendering of large numbers of autonomous characters.

Deferred Splatting. Gaël GUENNEBAUD Loïc BARTHE Mathias PAULIN IRIT UPS CNRS TOULOUSE FRANCE.

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

Broken Age's Approach to Scalability. Oliver Franzke Lead Programmer, Double Fine Productions

The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering. Bill Mark and Kekoa Proudfoot. Stanford University

Here s the general problem we want to solve efficiently: Given a light and a set of pixels in view space, resolve occlusion between each pixel and

8/5/2012. Introduction. Transparency. Anti-Aliasing. Applications. Conclusions. Introduction

Spring 2009 Prof. Hyesoon Kim

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

PowerVR. Performance Recommendations

Beyond Programmable Shading 2012

Spring 2011 Prof. Hyesoon Kim

Graphics, Mobile Computing, APIs and Life

GeForce4. John Montrym Henry Moreton

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)

POWERVR. 3D Application Development Recommendations

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

Performance Analysis and Culling Algorithms

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Computer Graphics. Lecture 9 Hidden Surface Removal. Taku Komura

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

MSAA- Based Coarse Shading

Intel Core 4 DX11 Extensions Getting Kick Ass Visual Quality out of the Latest Intel GPUs

GUERRILLA DEVELOP CONFERENCE JULY 07 BRIGHTON

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Graphics Pipeline

EECS 487: Interactive Computer Graphics

Hello & welcome to this last part about many light rendering on mobile hardware, i.e. on devices like smart phones and tablets.

FROM VERTICES TO FRAGMENTS. Lecture 5 Comp3080 Computer Graphics HKBU

CS450/550. Pipeline Architecture. Adapted From: Angel and Shreiner: Interactive Computer Graphics6E Addison-Wesley 2012

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Visibility: Z Buffering

Achieving High-performance Graphics on Mobile With the Vulkan API

Transcription:

Your logo on white Mobile HW and Bandwidth Andrew Gruber Qualcomm Technologies, Inc.

Agenda and Goals Describe the Power and Bandwidth challenges facing Mobile Graphics Describe some of the Power Saving approaches used in Mobile Graphics to deal with these challenges By no means a systematic account, just a sample of techniques used by major providers Provide hints and best practices to allow SW to take advantage of the above approaches.

LPDDRx Bandwidth Evolution LPDDR4 PC-DDR4 BW LPDDR3

Mobile Power Consumption for a Game Game running at 1080p / 60 fps DDR consumption is ~16% for this gaming use case on Qualcomm Snapdragon processors. Comparable to internal GPU power and CPU power consumption Mobile device - Power Breakdown Typical OpenGL ES 3.1 based game Rest of the System DDR 16% CPU 24% GPU 16% Snapdragon is a product of Qualcomm Technologies, Inc.

Resolutions Mobile Resolutions are high and increasing 1280x720 1920x1080 2560x1440 Base Resolution 2.25x more pixels 4x more pixels 3840x2160 9x more pixels

Binning/Tiling Architecture to Save Memory Bandwidth GPU has a dedicated fast tile buffer. The rendering surface is split up into bins. Rendering pass draws only pixels that are visible (saving GPU cycles and power) The GPU efficiently burst writes all blended pixels from the tile buffer as a single layer to the frame buffer in system memory referred to as a Resolve Note that typically there is no Depth traffic to system memory and any MSAA color traffic is first downfiltered before being written to System Memory. Rendering Order Current Bin Next Bin Opaque Alpha Blended Alpha Blended Tiled Texture Read Texture Read Texture Read Resolve (Write) Direct Texture Read FB Write FB Read Texture Read FB Write FB Read Texture Read FB Write

Qualcomm Technologies Approach to Binning/Tiling The only output of Binning Pass is a compressed 1 bit/primitive visibility stream Avoids write bandwidth for transformed positions and overflow/ management of transformed VBO.

Tiling Gotcha s Use Frame Buffer Discard to indicate that old data will be overwritten Otherwise we need to copy ( unresolve ) the old frame buffer back into the Tile Buffer Frame Buffer Clear works as well but means we have to waste time clearing the Tile Buffer Mid frame calls of glteximage2d, glbufferdata, glreadpixels, glcopyteximage2d, etc., force the driver to Save/Restore GPU Tile Buffer Fast, local, internal memory where tiles are rendered Unresolve of Old Frame Mid Frame Partial Resolve Mid Frame Render-to-Texture Mid Frame Partial Unresolve Final Resolve Frame Buffer Slow external memory

Bandwidth Saving Technique Deferred Rending Hidden Surface Removal or Deferred Rendering can avoid rendering occluded pixels even for non-sorted command streams. Works by deferring color render until Z value is known similar to a Z prerender. Doesn t work for all cases -- Alpha Blending or Pixel Kill are problematic Still Depth sort when possible. Allows Early Z to remove a lot of processing Image used with permission of Imagination Technologies

Bandwidth Saving Technique Forward Pixel Kill Use Early Z test to reach into processing pipeline and kill pixels that are still being processed. Many of the benefits of Deferred Rendering Kills even with back-to-front Ordering Can kill pixels even if the tile has Alpha Blending. Effectiveness drops as tile size or tile complexity increases Image used with permission of ARM

Bandwidth Saving Technique Tessellation Supported in AEP/OpenGL ES 3.1 Provides the ability to render very smooth and high detailed scenes Tessellation OFF Tessellation ON

Bandwidth Saving Technique - 1. Tessellate on Binning Pass Tessellation 3 Different Possibilities and Bounding Box Extension I. Works but requires extra bandwidth to Save/Read tessellated Vertices 2. Tessellate on Render Pass I. Saves bandwidth - but requires tessellated primitive to appear in all bins II. Extra tessellation cycles in each Render Pass 3. Tessellate on Both Passes I. Limits extra Render Pass work for non-visible Bins II. But pay extra tessellation cycles during Binning 4. Implementing a Bounding Box extension can help limit Render pass work in (2) or Binning Pass work in (3).

Bandwidth Saving Technique - Texture Compression Texture compression is a critical tool for reducing bandwidth - which improves performance and lowers power consumption All textures containing color data should be compressed using one of the recommended formats below. Original ASTC Compression Target API Open GL ES 2.0 Open GL ES 3.0 Open GL ES 3.1 Format ATC ETC2 ETC2 or ASTC 24 bpp 8 bpp 3.56 bpp 2 bpp

Bandwidth Decompression Bandwidth Compression Bandwidth Decompression Bandwidth Saving Technique Render Target Compression Lossless means that can be invoked anytime the CPU doesn t need access. Doesn t save space -- only memory bandwidth. End-to-End (understood by both Texture Unit and Display) means displayable Targets can be compressed and saves bandwidth on the Scan-out as well. GPU DDR Display Compressed image Full size image Full size image

Summary Bandwidth is becoming the dominant performance and power consumption consideration for mobile graphics Mobile chips use tricks to try minimize bandwidth as much as possible. Please cooperate with us by using mobile friendly algorithms and using compressed surfaces.