Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group

Similar documents
Building scalable 3D applications. Ville Miettinen Hybrid Graphics

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Real - Time Rendering. Pipeline optimization. Michal Červeňanský Juraj Starinský

Graphics Performance Optimisation. John Spitzer Director of European Developer Technology

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Feeding the Beast: How to Satiate Your GoForce While Differentiating Your Game

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

OpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015

CS451Real-time Rendering Pipeline

POWERVR MBX & SGX OpenVG Support and Resources

Whiz-Bang Graphics and Media Performance for Java Platform, Micro Edition (JavaME)

The Ultimate Developers Toolkit. Jonathan Zarge Dan Ginsburg

GeForce3 OpenGL Performance. John Spitzer

Module 13C: Using The 3D Graphics APIs OpenGL ES

Real-Time Universal Capture Facial Animation with GPU Skin Rendering

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Rendering Objects. Need to transform all geometry then

Profiling and Debugging Games on Mobile Platforms

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

The Application Stage. The Game Loop, Resource Management and Renderer Design

PROFESSIONAL. WebGL Programming DEVELOPING 3D GRAPHICS FOR THE WEB. Andreas Anyuru WILEY. John Wiley & Sons, Ltd.

PowerVR: Getting Great Graphics Performance with the PowerVR Insider SDK. PowerVR Developer Technology

Viewport 2.0 API Porting Guide for Locators

Adding Advanced Shader Features and Handling Fragmentation

Spring 2011 Prof. Hyesoon Kim

General Purpose Computation (CAD/CAM/CAE) on the GPU (a.k.a. Topics in Manufacturing)

Multimedia Platform. Mainstream wireless multimedia expands globally with the industry s first single-chipset solution

GeForce4. John Montrym Henry Moreton

Copyright Khronos Group Page 1

Mattan Erez. The University of Texas at Austin

How to Work on Next Gen Effects Now: Bridging DX10 and DX9. Guennadi Riguer ATI Technologies

Spring 2009 Prof. Hyesoon Kim

Adaptive Point Cloud Rendering

Working with Metal Overview

Programming Graphics Hardware

Accelerating Realism with the (NVIDIA Scene Graph)

EE 4702 GPU Programming

Dave Shreiner, ARM March 2009

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

NVIDIA Developer Tools for Graphics and PhysX

A fixed-point 3D graphics library with energy-efficient efficient cache architecture for mobile multimedia system

Coming to a Pixel Near You: Mobile 3D Graphics on the GoForce WMP. Chris Wynn NVIDIA Corporation

CHAPTER 1 Graphics Systems and Models 3

Optimisation. CS7GV3 Real-time Rendering

Practical Performance Analysis Koji Ashida NVIDIA Developer Technology Group

CS 498 VR. Lecture 19-4/9/18. go.illinois.edu/vrlect19

Lecture 2. Shaders, GLSL and GPGPU

PERFORMANCE OPTIMIZATIONS FOR AUTOMOTIVE SOFTWARE

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

RSX Best Practices. Mark Cerny, Cerny Games David Simpson, Naughty Dog Jon Olick, Naughty Dog

M3G Overview. Tomi Aarnio Nokia Research Center

Graphics Processing Unit Architecture (GPU Arch)

Graphics Hardware. Instructor Stephen J. Guy

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

Squeezing Performance out of your Game with ATI Developer Performance Tools and Optimization Techniques

Coding OpenGL ES 3.0 for Better Graphics Quality

Pump Up Your Pipeline

PowerVR Performance Recommendations. The Golden Rules

Windowing System on a 3D Pipeline. February 2005

1.2.3 The Graphics Hardware Pipeline

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Models and Architectures

The Rasterization Pipeline

3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)

2.11 Particle Systems

Render-To-Texture Caching. D. Sim Dietrich Jr.

CS427 Multicore Architecture and Parallel Computing

Rendering Algorithms: Real-time indirect illumination. Spring 2010 Matthias Zwicker

CSE 167: Lecture #4: Vertex Transformation. Jürgen P. Schulze, Ph.D. University of California, San Diego Fall Quarter 2012

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

Falanx Microsystems. Company Overview

The Illusion of Motion Making magic with textures in the vertex shader. Mario Palmero Lead Programmer at Tequila Works

Real-Time Graphics Architecture

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)


Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

More frames per second. Alex Kan and Jean-François Roy GPU Software

NVIDIA Developer Toolkit. March 2005

Evolution of GPUs Chris Seitz

GPU Ray Tracing at the Desktop and in the Cloud. Phillip Miller, NVIDIA Ludwig von Reiche, mental images

Chapter Answers. Appendix A. Chapter 1. This appendix provides answers to all of the book s chapter review questions.

Mobile 3D Devices. -- They re not little PCs! Stephen Wilkinson Graphics Software Technical Lead Texas Instruments CSSD/OMAP

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

NVIDIA Parallel Nsight. Jeff Kiel

Interactive Computer Graphics A TOP-DOWN APPROACH WITH SHADER-BASED OPENGL

Advances in Qt 3D. Kévin Ottens, Software Craftsman at KDAB

LOD and Occlusion Christian Miller CS Fall 2011

Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow. Takahiro Harada, AMD 2017/3

ARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)

CS4620/5620: Lecture 14 Pipeline

Advanced Imaging Applications on Smart-phones Convergence of General-purpose computing, Graphics acceleration, and Sensors

Module Introduction. Content 15 pages 2 questions. Learning Time 25 minutes

Introduction to Shaders.

Overview. Technology Details. D/AVE NX Preliminary Product Brief

NVIDIA nfinitefx Engine: Programmable Pixel Shaders

Performance OpenGL Programming (for whatever reason)

Transcription:

Introduction ti to Adreno Tools Baback Elmieh, Software Lead James Ritts, Profiler Lead Qualcomm Incorporated Advanced Content Group

Qualcomm HW Accelerated 3D: Adreno Moving content-quality forward requires hardware acceleration Up to 1024x768 screen-resolution by mid-2009 Blend effects and composition of 3D with other media types Multiple texture support with combiners Qualcomm is enabling Hardware 3D on all its chipset tiers 2

Adreno Product Family OpenGL-ES 1.0 Adreno 100 and Adreno 110 Commercial for 3 years, high volumes Entry-level hardware acceleration OpenGL-ES 1.0+/OpenGL-ES 1.1 Adreno 120 and Adreno 130 Commercial now in Asia, entering US by July High performance fixed function pipeline with texture combiners and matrix palette extensions OpenGL-ES 2.0 Adreno 200, Adreno 210 and Adreno 220 Commercial end of this year High performance, flexible shader pipeline 3 3

Adreno Graphics Platform So many devices 4

Challenge for developers Current HW accelerated 3D Content t Main SKU is Software HW is treated as an incremental feature: e.g. bilinear filtering, marginally higher-res textures Engines that do support HW do so with orthogonal render paths Market is shifting, will you be able to make the transition from incremental hardware support, to full support? Problems It s an embedded device, you get the best performance from coding to the hardware But: HW manufacturers don t make it easy 5

Architecture of one platform... Adreno 100, 110 (App) (Vertex T&L) (Rasterization) (Frame Swaps) ARM9 CPU QDSP-4 Adreno 100/110 GPU Disp Ctrl SYS AHB APPS AHB DMA AHB MemCtrl 6

Another platform... Adreno 130 AHB ARM11 MM CPU Adreno 130 GPU Disp Ctrl v2 AXI-1 AXI-2 External MemCtrl Internal MemCtrl Problem: Doing R&D for each platform is way too expensive 7

Qualcomm s Adreno Tools Adreno tools Lowering cost of tuning content for HW All the tools necessary to create cutting-edge g 3D content for Adreno platforms Create Content Export Optimize i Geometry & Textures Code Tune Performance Deploy Simulate 8

Adreno Tool Packages Developed alongside the hardware: brought up alongside drivers, and system QX Engine SDK Maya, 3DS Max Exporters Particle System Editor QStrip Triangle Stripper QXTextureBuilder texture optimizer Full rendering and animation engine Full F Source Code Adreno Profiler On-Device profiling Directed analysis Hardware metric access API Traces Powerful debugging features 9

Performance optimization i using Adreno Tools

First Prototype Woes 11 After porting, I m getting 5 FPS?!

Graphics Pipeline (App) (Vertex T&L) (Rasterization) (Frame Swaps) ARM9 CPU QDSP-4 Adreno 100/110 GPU Disp Ctrl AHB SYS AHB ARM11 MM CPU Adreno 130 GPU Disp Ctrl v2 APPS AHB DMA AHB AXI-1 AXI-2 MemCtrl External MemCtrl Internal MemCtrl 12

Introducing Adreno Profiler USB Quickly identify and analyze bottlenecks on multiple platforms 13

Driver Instrumentation Adreno Profiler PC Client COM Layer System Metrics API Trace GPU Metrics Overrides 3D Graphics Driver Graphics driver is extensively instrumented No application changes required Available on commercial devices 14

Case Study: Neocore Optimization Platform: Q3Dimension R4.0 (MSM7201) Initial performance: 5-10fps After optimization: 30fps 15

Demo USB 16

Graphics Pipeline 17

QXTextureBuilder Tool Easily converted all textures to mipmapped + ATITC Memory footprint reduced by 75% Huge GPU caching benefit 18

QStrip Tool Converted all meshes from discrete triangles to triangle strips Also enabled frustum culling in QX Engine /* compute strip */ Qstrip* pstripify = QstripCreate(); ushort* dstrip = QstripComputeStrips(pStripify, indexnum, indexdata); ulong nstriplen = QstripGetStripLength(pStripify); 19

Adreno Profiler USB HW and system-level real-time performance metrics Powerful frame analysis and debugging features Supports current and future Qualcomm Adreno platforms, including upcoming OpenGLES v2-based cores Available Today on commercial devices 20

Thank You!

To Join Qualcomm s Adreno Developer Program, simply email: Tim Leland tleland@qualcomm.com 3D HW enabled Handsets Better Ecosystem increases revenue for all partners Better 3D Content Higher ARPU Better Consumer Value 22

QX Engine 1.2 Features for Adreno 120/130 Layered Textures DOT3 Bumpmapping Specular Mapping Particle Engine & Authoring System 23 New APIs and utilities

Case Study Step 1 Neocore Build 1 Metrics graph: 11 FPS Profiler statistics: Statistics show textures are not ideal Forcing 1x1 Textures jumps performance to 22 FPS Bottleneck is in the texture fetch stage Force 1x1 Textures Unoptimized Textures 24

Case Study Step 2 Neocore Build 2 Used QX Texture Converter to create ATI_TC, Mipmapped textures Metrics graph: 20 FPS We have moved the bottleneck, it is no longer in the texture pipeline Statistics gathering hints that triangle-strips are not being used Pulling back the camera shows unnecessary off-camera rendering Our bottleneck is in the front-end: too many unnecessary polys + not optimized Off-camera geometry Unoptimized Triangles 25

Case Study From 11 to 30 FPS in 3 steps Neocore Build 3 Used QStrip to generate triangle strips, added QX Engine frustum culling Metrics graph: 30 FPS We re done Culled geometry Optimized Triangles Optimized Textures 26