TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

Similar documents
TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems

Parallel Frame Rendering: Trading Responsiveness for Energy on a Mobile GPU

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Mattan Erez. The University of Texas at Austin

Bifrost - The GPU architecture for next five billion

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CS427 Multicore Architecture and Parallel Computing

Threading Hardware in G80

Case 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C

Dave Shreiner, ARM March 2009

Saving the Planet Designing Low-Power, Low-Bandwidth GPUs

Next-Generation Graphics on Larrabee. Tim Foley Intel Corp

PowerVR Hardware. Architecture Overview for Developers

Spring 2011 Prof. Hyesoon Kim

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Graphics Processing Unit Architecture (GPU Arch)

Spring 2009 Prof. Hyesoon Kim

Portland State University ECE 588/688. Graphics Processors

Lecture 25: Board Notes: Threads and GPUs

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

Vertex Shader Design I

Rasterization Overview

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

EE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin

Copyright Khronos Group, Page Graphic Remedy. All Rights Reserved

Hardware-driven Visibility Culling Jeong Hyun Kim

Hardware-driven visibility culling

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Case 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D

Hardware- Software Co-design at Arm GPUs

Lecture 2. Shaders, GLSL and GPGPU

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 18 GPUs (III)

GPU Architecture and Function. Michael Foster and Ian Frasch

PowerVR Series5. Architecture Guide for Developers

CS452/552; EE465/505. Clipping & Scan Conversion

Graphics Hardware. Instructor Stephen J. Guy

The Graphics Pipeline

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Windowing System on a 3D Pipeline. February 2005

Jeremy W. Sheaffer 1 David P. Luebke 2 Kevin Skadron 1. University of Virginia Computer Science 2. NVIDIA Research

GPU A rchitectures Architectures Patrick Neill May

CS230 : Computer Graphics Lecture 4. Tamar Shinar Computer Science & Engineering UC Riverside

Jomar Silva Technical Evangelist

The Graphics Pipeline

Computer graphics 2: Graduate seminar in computational aesthetics

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Understanding Shaders and WebGL. Chris Dalton & Olli Etuaho

PowerVR Performance Recommendations. The Golden Rules

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Current Trends in Computer Graphics Hardware

Real-Time Reyes: Programmable Pipelines and Research Challenges. Anjul Patney University of California, Davis

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

E.Order of Operations

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

Profiling and Debugging Games on Mobile Platforms

Introduction to CUDA (1 of n*)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

CS452/552; EE465/505. Finale!

Structure. Woo-Chan Park, Kil-Whan Lee, Seung-Gi Lee, Moon-Hee Choi, Won-Jong Lee, Cheol-Ho Jeong, Byung-Uck Kim, Woo-Nam Jung,

SOLUTION TO SHADER RECOMPILES IN RADEONSI SEPTEMBER 2015

Ciril Bohak. - INTRODUCTION TO WEBGL

Beyond Programmable Shading Course ACM SIGGRAPH 2010 Panel: Fixed-Function Hardware

CS 428: Fall Introduction to. OpenGL primer. Andrew Nealen, Rutgers, /13/2010 1

GRAPHICS PROCESSING UNITS

Graphics Programming. Computer Graphics, VT 2016 Lecture 2, Chapter 2. Fredrik Nysjö Centre for Image analysis Uppsala University

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Introduction to Modern GPU Hardware

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

Programming with OpenGL Part 3: Shaders. Ed Angel Professor of Emeritus of Computer Science University of New Mexico

A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization

OpenGL on Android. Lecture 7. Android and Low-level Optimizations Summer School. 27 July 2015

Enhancing Traditional Rasterization Graphics with Ray Tracing. October 2015

NVIDIA nfinitefx Engine: Programmable Pixel Shaders

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

EECS 487: Interactive Computer Graphics

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

1.2.3 The Graphics Hardware Pipeline

A Bandwidth Effective Rendering Scheme for 3D Texture-based Volume Visualization on GPU

Rendering. Converting a 3D scene to a 2D image. Camera. Light. Rendering. View Plane

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Models and Architectures

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

GPU Memory Model. Adapted from:

GRAPHICS HARDWARE. Niels Joubert, 4th August 2010, CS147

Shaders. Slide credit to Prof. Zwicker

Spring 2009 Prof. Hyesoon Kim

COMP371 COMPUTER GRAPHICS

Hands-On Workshop: 3D Automotive Graphics on Connected Radios Using Rayleigh and OpenGL ES 2.0

Antonio R. Miele Marco D. Santambrogio

Rendering Objects. Need to transform all geometry then

Mobile HW and Bandwidth

The Application Stage. The Game Loop, Resource Management and Renderer Design

EE 4702 GPU Programming

Parallel Programming for Graphics

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

Game Graphics & Real-time Rendering

Fast Stereoscopic Rendering on Mobile Ray Tracing GPU for Virtual Reality Applications

CS130 : Computer Graphics. Tamar Shinar Computer Science & Engineering UC Riverside

Transcription:

International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer Architecture Department Universitat Politecnica de Catalunya Intel Labs Intel Corporation

Motivation Mobile GPU Simulator Requirements 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory Not supported by any publicly available GPU simulator 2

Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time Not supported by any publicly available GPU simulator 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory Tailored towards desktop-like power hungry GPUs 2

Outline 1. Motivation 2. Simulation Infrastructure 2.1. OpenGL ES Trace Generation 2.2. GPU Functional Simulation 2.3. Cycle-Accurate Timing Simulation 2.4. Power Model 2.5. Image Quality Assessment 3. Conclusions 3

Simulation Infrastructure - Overview Tools unmodified Mobile Applications Tools adapted Tools created from scratch Desktop Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator Trace files Statistics OpenGL Trace Generation OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry 4

Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Trace GPU assembly instructions Memory addresses Statistics OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) Frames 4

Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Timing Simulation GPU Trace GPU assembly instructions Memory addresses Cycle-Accurate GPU Simulator OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) McPAT Statistics Frames Screen Power Model Image Quality Assessment 4

Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Timing Simulation GPU Trace GPU assembly instructions Memory addresses Cycle-Accurate GPU Simulator GPU Execution Time GPU Energy OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) McPAT Statistics Frames Screen Power Model Image Quality Assessment Screen Energy Image Quality 4

OpenGL ES Trace Generator Mobile Game Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop 5

OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Thread: Mobile Game OpenGL Context: A OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop OpenGL ES Trace - OpenGL ES Command List - Shaders - Geometry - Textures 5

OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Thread: Mobile Game OpenGL Context: A OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop OpenGL ES Trace - OpenGL ES Command List - Shaders - Geometry - Textures 5

OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures Desktop GPU Desktop OpenGL ES Trace 5

OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures Desktop GPU Desktop OpenGL ES Trace 5

OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU Desktop GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List Desktop OpenGL ES Trace Thread: Surface Flinger OpenGL Context: C - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures - Shaders - Geometry - Textures 5

OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Support for multiple applications and OpenGL ES contexts Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU Desktop GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List Desktop OpenGL ES Trace Thread: Surface Flinger OpenGL Context: C - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures - Shaders - Geometry - Textures 5

GPU Functional Simulation OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry OpenGL ES front-end Intermediate Representation: TGSI Tungsten Graphics Shader Infrastructure Instrumented Softpipe Driver - Software Rasterizer - TGSI Emulator Gallium3D GPU Trace Information stored per GPU command: - Thread ID - OpenGL ES Context ID - GPU Assembly Instructions (TGSI) - Memory addresses referenced for fetching vertices, texels and pixels... 6

GPU Timing Simulator Immediate-Mode Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace GPU Command 0 Command Processor GPU Command 1 Geometry Unit GPU Command 2 Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller 7

GPU Timing Simulator Immediate-Mode Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace GPU Command 0 Command Processor GPU Command 1 Geometry Unit GPU Command 2 Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller Pixel Texture ALU LD/ST Fragment Processor Pixel Early Depth Test Rasterizer Raster Unit 0 Raster Unit 1 7

GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller 8

GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Tiling Engine L2 Tile Primitive Assembly Polygon List Builder Tile Scheduler Memory Controller 8

GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Tiling Engine Tile Memory Controller Texture ALU LD/ST Fragment Processor Polygon List Builder Tile Scheduler L2 Color Buffer Primitive Assembly Z-Buffer Early Depth Test Rasterizer Raster Unit 0 Raster Unit 1 8

GPU Timing Simulator Fragment/Vertex Processors Simple in-order 4-stage pipeline Multi-warp execution Vectorial ISA Texture Sampling Units Warp scheduler Constant Register File Input Register File SIMD ALU SFU Instruction Memory Operand buffering & routing Output Register File Instruction Fetch Temporal Register File Instruction Decode MEM UNIT Pixel Pixel TEX UNIT Texture Texture Execution WriteBack 9

GPU Power Model Based on McPAT TEAPOT extensions: Multiple data caches per core Read-only caches Specialized graphics hardware (texture sampling units...) Output file in JSON format Directly called from timing simulator Configuration Configurationfile file num_raster_units num_raster_units num_geometry_units num_geometry_units num_fragment_procs num_fragment_procs num_vertex_procs num_vertex_procs num_warps_per_proc num_warps_per_proc... GPU description Cycle-Accurate GPU Simulator Area, Leakage McPAT Activity Factors Dynamic Power 10

Screen Power Model OLED displays Consume different energy depending on the colors Screen energy depends on the output generated by the GPU OLED-based displays power model Provides three functions, f(r), f(g), f(b), that map pixel intensity into energy consumption M. Dong, Y.-S. K. Choi, and L. Zhong. Power Modeling of Graphical User Interfaces on OLED Displays. In Proc. of DAC, pages 652 657, 2009. 11

Image Quality Assessment Image Quality Metrics Based on per-pixel errors MSE (Mean-Squared Error) PSNR (Peak Signal-to-Noise Ratio) Based on the human visual perception system MSSIM (Mean Structural SIMilarity Index) Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004. Require reference noise-free image for comparison Evaluate distortion when trading quality for energy 12

Outline 1. Motivation 2. Simulation Infrastructure 2.1. OpenGL ES Trace Generation 2.2. GPU Functional Simulation 2.3. Cycle-Accurate Timing Simulation 2.4. Power Model 2.5. Image Quality Assessment 3. Conclusions 13

Conclusions The TEAPOT toolset is tailored towards the mobile segment since it: Runs unmodified Android applications Estimates performance, energy, area and image quality of mobile graphics systems Provides a flexible timing simulator, supporting Immediate-Mode and Tile-Based Deferred Rendering Reports statistics: Per Application, including Android OS (full-system) Per Frame Per Component: GPU, System Memory and Screen 14

International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer Architecture Department Universitat Politecnica de Catalunya Intel Labs Intel Corporation