International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer Architecture Department Universitat Politecnica de Catalunya Intel Labs Intel Corporation
Motivation Mobile GPU Simulator Requirements 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory Not supported by any publicly available GPU simulator 2
Motivation Mobile GPU Simulator Requirements 1. Support for Mobile Applications 3. GPU & Screen Power Models System Memory GPU Screen % of energy 2. Full-System GPU Simulation Android Angry Birds Advertisement % of GPU time Not supported by any publicly available GPU simulator 4. Flexible GPU Timing Simulator Tile-Based Deferred Rendering Immediate Mode Rendering GPU GPU On-Chip Memory External Memory Tailored towards desktop-like power hungry GPUs 2
Outline 1. Motivation 2. Simulation Infrastructure 2.1. OpenGL ES Trace Generation 2.2. GPU Functional Simulation 2.3. Cycle-Accurate Timing Simulation 2.4. Power Model 2.5. Image Quality Assessment 3. Conclusions 3
Simulation Infrastructure - Overview Tools unmodified Mobile Applications Tools adapted Tools created from scratch Desktop Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator Trace files Statistics OpenGL Trace Generation OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry 4
Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Trace GPU assembly instructions Memory addresses Statistics OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) Frames 4
Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Timing Simulation GPU Trace GPU assembly instructions Memory addresses Cycle-Accurate GPU Simulator OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) McPAT Statistics Frames Screen Power Model Image Quality Assessment 4
Simulation Infrastructure - Overview Tools unmodified Tools adapted Mobile Applications Tools created from scratch Trace files OpenGL Trace Generation Desktop Android 4.2 Jelly Bean Android Emulator OpenGL ES Trace Generator Virtual GPU GPU Functional Simulation GPU Timing Simulation GPU Trace GPU assembly instructions Memory addresses Cycle-Accurate GPU Simulator GPU Execution Time GPU Energy OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry Instrumented Gallium3D (softpipe driver) McPAT Statistics Frames Screen Power Model Image Quality Assessment Screen Energy Image Quality 4
OpenGL ES Trace Generator Mobile Game Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop 5
OpenGL ES Trace Generator Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Thread: Mobile Game OpenGL Context: A OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop OpenGL ES Trace - OpenGL ES Command List - Shaders - Geometry - Textures 5
OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU Thread: Mobile Game OpenGL Context: A OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Desktop GPU Desktop OpenGL ES Trace - OpenGL ES Command List - Shaders - Geometry - Textures 5
OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures Desktop GPU Desktop OpenGL ES Trace 5
OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures Desktop GPU Desktop OpenGL ES Trace 5
OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU Desktop GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List Desktop OpenGL ES Trace Thread: Surface Flinger OpenGL Context: C - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures - Shaders - Geometry - Textures 5
OpenGL ES Trace Generator Virtual Buttons Mobile Game gldrawarrays(...) Support for multiple applications and OpenGL ES contexts Android 4.2 Jelly Bean Surface Flinger Android Emulator Virtual GPU Desktop GPU OpenGL ES Trace Generator void gldrawarrays(...) { savecommandinfo(...); real_gldrawarrays(...); } Thread: Mobile Game OpenGL Context: A Thread: Virtual Buttons OpenGL Context: B - OpenGL ES Command List - OpenGL ES Command List Desktop OpenGL ES Trace Thread: Surface Flinger OpenGL Context: C - OpenGL ES Command List - Shaders - Shaders - Geometry - Geometry - Textures - Textures - Shaders - Geometry - Textures 5
GPU Functional Simulation OpenGL ES Trace Vertex/Fragment programs (GLSL) Textures Geometry OpenGL ES front-end Intermediate Representation: TGSI Tungsten Graphics Shader Infrastructure Instrumented Softpipe Driver - Software Rasterizer - TGSI Emulator Gallium3D GPU Trace Information stored per GPU command: - Thread ID - OpenGL ES Context ID - GPU Assembly Instructions (TGSI) - Memory addresses referenced for fetching vertices, texels and pixels... 6
GPU Timing Simulator Immediate-Mode Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace GPU Command 0 Command Processor GPU Command 1 Geometry Unit GPU Command 2 Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller 7
GPU Timing Simulator Immediate-Mode Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace GPU Command 0 Command Processor GPU Command 1 Geometry Unit GPU Command 2 Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller Pixel Texture ALU LD/ST Fragment Processor Pixel Early Depth Test Rasterizer Raster Unit 0 Raster Unit 1 7
GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Primitive Assembly L2 Memory Controller 8
GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Tiling Engine L2 Tile Primitive Assembly Polygon List Builder Tile Scheduler Memory Controller 8
GPU Timing Simulator Tile-Based Deferred Rendering Fixed-Function Stage Programmable Stage Memory Hierarchy GPU Trace Command Processor GPU Command 0 GPU Command 1 GPU Command 2 Geometry Unit Vertex Vertex Fetcher Vertex Processor Tiling Engine Tile Memory Controller Texture ALU LD/ST Fragment Processor Polygon List Builder Tile Scheduler L2 Color Buffer Primitive Assembly Z-Buffer Early Depth Test Rasterizer Raster Unit 0 Raster Unit 1 8
GPU Timing Simulator Fragment/Vertex Processors Simple in-order 4-stage pipeline Multi-warp execution Vectorial ISA Texture Sampling Units Warp scheduler Constant Register File Input Register File SIMD ALU SFU Instruction Memory Operand buffering & routing Output Register File Instruction Fetch Temporal Register File Instruction Decode MEM UNIT Pixel Pixel TEX UNIT Texture Texture Execution WriteBack 9
GPU Power Model Based on McPAT TEAPOT extensions: Multiple data caches per core Read-only caches Specialized graphics hardware (texture sampling units...) Output file in JSON format Directly called from timing simulator Configuration Configurationfile file num_raster_units num_raster_units num_geometry_units num_geometry_units num_fragment_procs num_fragment_procs num_vertex_procs num_vertex_procs num_warps_per_proc num_warps_per_proc... GPU description Cycle-Accurate GPU Simulator Area, Leakage McPAT Activity Factors Dynamic Power 10
Screen Power Model OLED displays Consume different energy depending on the colors Screen energy depends on the output generated by the GPU OLED-based displays power model Provides three functions, f(r), f(g), f(b), that map pixel intensity into energy consumption M. Dong, Y.-S. K. Choi, and L. Zhong. Power Modeling of Graphical User Interfaces on OLED Displays. In Proc. of DAC, pages 652 657, 2009. 11
Image Quality Assessment Image Quality Metrics Based on per-pixel errors MSE (Mean-Squared Error) PSNR (Peak Signal-to-Noise Ratio) Based on the human visual perception system MSSIM (Mean Structural SIMilarity Index) Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Transactions on Image Processing, 2004. Require reference noise-free image for comparison Evaluate distortion when trading quality for energy 12
Outline 1. Motivation 2. Simulation Infrastructure 2.1. OpenGL ES Trace Generation 2.2. GPU Functional Simulation 2.3. Cycle-Accurate Timing Simulation 2.4. Power Model 2.5. Image Quality Assessment 3. Conclusions 13
Conclusions The TEAPOT toolset is tailored towards the mobile segment since it: Runs unmodified Android applications Estimates performance, energy, area and image quality of mobile graphics systems Provides a flexible timing simulator, supporting Immediate-Mode and Tile-Based Deferred Rendering Reports statistics: Per Application, including Android OS (full-system) Per Frame Per Component: GPU, System Memory and Screen 14
International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer Architecture Department Universitat Politecnica de Catalunya Intel Labs Intel Corporation