Power Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller

Similar documents
Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

1. Introduction. Introduction to Computer Graphics

Spring 2009 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim

GPU Architecture and Function. Michael Foster and Ian Frasch

Multimedia in Mobile Phones. Architectures and Trends Lund

Efficient and Scalable Shading for Many Lights

Mobile graphics API Overview

Beyond Programmable Shading 2012

Scan Conversion of Polygons. Dr. Scott Schaefer

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Performance Analysis and Culling Algorithms

Programming Graphics Hardware

EECS 487: Interactive Computer Graphics

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

CS451Real-time Rendering Pipeline

Graphics and Imaging Architectures

Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Introduction to Modern GPU Hardware

SIGGRAPH Briefing August 2014

Windowing System on a 3D Pipeline. February 2005

Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Living Image. Release Notes Version Purpose. 2. New Features. FLIT Algorithm Improvements. Single-Click FLIT and DLIT Reconstructions

The mobile computing evolution. The Griffin architecture. Memory enhancements. Power management. Thermal management

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

EE , GPU Programming

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Maya , MotionBuilder & Mudbox 2018.x August 2018

GPU Programming and Architecture: Course Overview

COMP 4801 Final Year Project. Ray Tracing for Computer Graphics. Final Project Report FYP Runjing Liu. Advised by. Dr. L.Y.

Computer Graphics. Hardware Pipeline. Visual Imaging in the Electronic Age Prof. Donald P. Greenberg October 23, 2014 Lecture 16

LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)

GPU Programming and Architecture: Course Overview

Dimensions. System information. Memory

EDAN30 Photorealistic Computer Graphics

Kenneth Dyke Sr. Engineer, Graphics and Compute Architecture

Filtering theory: Battling Aliasing with Antialiasing. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Living Image. Release Notes Version 4.7

Computer Graphics. Hardware Pipeline. Visual Imaging in the Electronic Age Prof. Donald P. Greenberg October 13, 2016 Lecture 15

Shaders. Slide credit to Prof. Zwicker

Mobile AR Hardware Futures

MSAA- Based Coarse Shading

PRIME Synchronization. XDC 2016 Alex Goins, Andy Ritger

Rasterization Overview

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES

OZO+ C A M E R A S E T U P G U I D E A N D R E Q U I R E M E N T S

Evolution of GPUs Chris Seitz

Course Recap + 3D Graphics on Mobile GPUs

Optimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June

Scanline Rendering 2 1/42

POWER MANAGEMENT AND ENERGY EFFICIENCY

3D-Hub Player System Requirements V2.2

Lucid Virtu Installation Guide

4-Port USB 3.0 Hub plus Dedicated Charging Port - 1 x 2.4A Port

ECE 571 Advanced Microprocessor-Based Design Lecture 21

Bringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse

Manycore and GPU Channelisers. Seth Hall High Performance Computing Lab, AUT

Enabling immersive gaming experiences Intro to Ray Tracing

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

A Mobile Accelerator Architecture for Ray Tracing

NVIDIA GT740 PCIe ADD-IN BOARD. Datasheet GFX-N3A2-01FMS1

How much data can a BluRay hold?

Milestone Systems. XProtect Smart Client 2017 R3. Hardware acceleration guide

CME 213 S PRING Eric Darve

LOD and Occlusion Christian Miller CS Fall 2011

Lecture 9: Deferred Shading. Visual Computing Systems CMU , Fall 2013

egfx Breakaway Box for AMD and NVIDIA GPUs

Overview. A real-time shadow approach for an Augmented Reality application using shadow volumes. Augmented Reality.

Lecture 1. Computer Graphics and Systems. Tuesday, January 15, 13

Programmable GPUS. Last Time? Reading for Today. Homework 4. Planar Shadows Projective Texture Shadows Shadow Maps Shadow Volumes

Shadow Rendering EDA101 Advanced Shading and Rendering

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

DEFERRED RENDERING STEFAN MÜLLER ARISONA, ETH ZURICH SMA/

Volume Graphics Introduction

SAPPHIRE R7 260X 2GB GDDR5 OC BATLELFIELD 4 EDITION

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

Computer Graphics Shadow Algorithms

WaveView. System Requirement V6. Reference: WST Page 1. WaveView System Requirements V6 WST

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

IMPROVING ENERGY EFFICIENCY THROUGH PARALLELIZATION AND VECTORIZATION ON INTEL R CORE TM

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

A Reconfigurable Architecture for Load-Balanced Rendering

MSI R7700 Series Infokit. Marketing Department Jerry Chan 15 th of February 2012

Lucid Virtu Installation Guide

Forza Horizon 4 Benchmark Guide

Antonio R. Miele Marco D. Santambrogio

MAXIS-mizing Darkspore*: A Case Study of Graphic Analysis and Optimizations in Maxis Deferred Renderer

Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

A Trip Down The (2011) Rasterization Pipeline

SAH guided spatial split partitioning for fast BVH construction. Per Ganestam and Michael Doggett Lund University

PowerVR Hardware. Architecture Overview for Developers

WINDOWS COMPUTER Price List

RADEON X1000 Memory Controller

Evaluation and Exploration of Next Generation Systems for Applicability and Performance Volodymyr Kindratenko Guochun Shi

Rendering Structures Analyzing modern rendering on mobile

AMD HD5450 PCIe ADD-IN BOARD. Datasheet AEGX-A3T5-01FST1

3D Sprint. From File to Print. Installation Guide. 3D Sprint Requirements and Installation

GPU Accelerating Speeded-Up Robust Features Timothy B. Terriberry, Lindley M. French, and John Helmsen

Transcription:

1 Power Efficiency for Software Algorithms running on Graphics Processors Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller

Overview 2 Motivation Goal Project Applications Methodology Results Observations

Motivation 3 Energy efficency increasingly important Phones, tablets, laptops, desktops... Hard area Harder to analyze than regular performance Need hardware and/or hardware support Less intuitive / harder to predict

Motivation 4 Lots of papers looking at algorithms for power efficient hardware What can be done from the software side on current hardware? Learn more about energy and power

Goal 5 Say we want to optimize to lower energy usage Do we have to measure power? Or can we make an estimate based on rendering times alone?

Project 6 Two common graphics problems Implemented several different solutions solving those problems Measure their power usage and rendering time

Applications 7 Primary visibility and shading Forward rendering Forward rendering with pre-z pass Deferred shading

Applications 8 Primary visibility and shading Forward rendering Forward rendering with pre-z pass Deferred shading

Applications 9 Shadow algorithms Stencil shadow volumes Shadow mapping Variance shadow mapping

Applications 10 Shadow algorithms Stencil shadow volumes Shadow mapping Variance shadow mapping

Applications 11 OpenGL / OpenGL ES Full speed (no capping) Widely different platforms gives different frame times (5ms to 2s) Timestamp at beginning and end of frame Only integrate over actual rendering time Animated camera path (~2000 frames)

Methodology 12 We have built a measurement station Measure at 40kHz 4 ACS710 Hall effect current sensors (<12A) 2 shunt current sensors (<1A) Connected between GPU and power source Different places on different platforms

Discrete graphics cards 13 Connected on the PCI-Express bus

Discrete graphics cards 14 Connected on the PCI-Express bus PCI-Express bus provides <=75W

Discrete graphics cards 15 Connected on the PCI-Express bus PCI-Express bus provides <=75W Measured through an Ultraview PCIeEXT-16HOT expander card

Discrete graphics cards 16 Connected on the PCI-Express bus PCI-Express bus provides <=75W Measured through an Ultraview PCIeEXT-16HOT expander card PCI-Express power connectors

Discrete graphics cards 17 Connected on the PCI-Express bus PCI-Express bus provides 75W Measured through an Ultraview PCIeEXT-16HOT expander card PCI-Express power connectors 8-pin provides <=150W 6-pin provides <=75W

Discrete graphics cards 18

Discrete graphics cards 19 AMD Radeon 7970 (28nm) NVIDIA GeForce GTX 580 (40nm)

SoC #1 20 Intel Sandy Bridge, HD3000 GPU (32nm) Connected on motherboards 4-pin power connector Provides power to CPU, GPU, and parts of the memory system Two runs Rendering pass Idle pass, all gl* calls removed from code

SoC #1 21 Rendering pass - idle pass =? GPU power Parts of the memory power Memory bandwith generated from the graphics workload Not including memory refresh power CPU power for driver execution

SoC #2 22 iphone 4S, PowerVR SGX543MP2 GPU (45nm) Connected on battery connectors Provides power to everything Two runs Rendering pass Idle pass

SoC #2 23 Rendering pass - idle pass =? GPU power Memory power Only for memory bandwidth generated from the graphics workload Not including memory refresh power CPU power for driver execution

SoC #2 24

Results 25 What we measure: High-frequency power data (40kHz) Rendering time per frame

GeForce GTX 580 26 360 360 160 18.748 18.756 Deferred shading 160 29.848 29.86 Variance shadow maps 6 light sources

GeForce GTX 580: Primary rendering 27 Power Forward Pre-Z Deferred 320 300 280 260 240 Power and Frame Rendering time vs Frame Number 220 Power (W) 200 30 Rendering time (ms) 20 10 Frametime Forward Pre-Z Deferred 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0

GeForce GTX 580: Shadows 28 320 300 Power and Frame Rendering time vs Frame Number 2 lights 4 lights 6 lights 8 lights Power Shadow volumes Variance shadow maps Shadow maps Power (W) 280 260 240 220 200 40 30 Rendering time (ms) 20 10 Frametime Shadow volumes Variance shadow maps Shadow maps 0 0 500 1000 1500 2000 2500 0 Frame Number

AMD Radeon 7970: Primary rendering 29 Forward Deferred

AMD Radeon 7970: Primary rendering 30 Forward Deferred

iphone 4S 31 3300 1.3 1.2 1.1 1 3000 2700 2400 Forward Power (W) 0.9 0.8 0.7 0.6 0.5 2100 1800 1500 1200 Rendering time (ms) Pre-Z 0.4 0.3 0.2 0.1 900 600 300 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0

iphone 4S 32 Higher energy than expected Mostly due to long rendering times Probably pushing sort-middle to flush buffers 3300 Forward Pre-Z Power (W) 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 3000 2700 2400 2100 1800 1500 1200 900 600 300 Rendering time (ms) 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0

Results 33 What numbers are we interested in?

Results 34 What other numbers are we interested in? Power

Results 35 What other numbers are we interested in? Power Energy

Results 36 What other numbers are we interested in? Power Energy More interesting for battery lifetime

Results 37 What other numbers are we interested in? Power Energy More interesting for battery lifetime But what metric should we use?

Metric 38 nj/pixel

Metric 39 nj/pixel Largely resolution independent

Metric 40 nj/pixel Largely resolution independent Easy to divide frame into segments

Metric 41 nj/pixel Largely resolution independent Easy to divide frame into segments Easy to calculate fillrate for a given TDP

Energy, nj/pixel 42 Primary visibility FR ZR DR GTX 580 1,443 722 511 Radeon 7970 607 512 489 Sandy Bridge 872 314 280 iphone 4S 2,234 2,015 -

Energy, nj/pixel 43 Shadows SV SM VSM GTX 580 1,325 447 532 Radeon 7970 953 469 804 Sandy Bridge 1,317 311 511 iphone 4S - 461 -

Observations 44 Not possible to estimate power / energy only from frame times Surprisingly, pre-z proved useful on a tiled, deferred, sort-middle architecture for our application Pushing too much geometry? Still similar energy per pixel, despite rendering times that differ by an order of magnitude or more nj/pixel

45 Thanks for listening Any questions?