1 Power Efficiency for Software Algorithms running on Graphics Processors Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller
Overview 2 Motivation Goal Project Applications Methodology Results Observations
Motivation 3 Energy efficency increasingly important Phones, tablets, laptops, desktops... Hard area Harder to analyze than regular performance Need hardware and/or hardware support Less intuitive / harder to predict
Motivation 4 Lots of papers looking at algorithms for power efficient hardware What can be done from the software side on current hardware? Learn more about energy and power
Goal 5 Say we want to optimize to lower energy usage Do we have to measure power? Or can we make an estimate based on rendering times alone?
Project 6 Two common graphics problems Implemented several different solutions solving those problems Measure their power usage and rendering time
Applications 7 Primary visibility and shading Forward rendering Forward rendering with pre-z pass Deferred shading
Applications 8 Primary visibility and shading Forward rendering Forward rendering with pre-z pass Deferred shading
Applications 9 Shadow algorithms Stencil shadow volumes Shadow mapping Variance shadow mapping
Applications 10 Shadow algorithms Stencil shadow volumes Shadow mapping Variance shadow mapping
Applications 11 OpenGL / OpenGL ES Full speed (no capping) Widely different platforms gives different frame times (5ms to 2s) Timestamp at beginning and end of frame Only integrate over actual rendering time Animated camera path (~2000 frames)
Methodology 12 We have built a measurement station Measure at 40kHz 4 ACS710 Hall effect current sensors (<12A) 2 shunt current sensors (<1A) Connected between GPU and power source Different places on different platforms
Discrete graphics cards 13 Connected on the PCI-Express bus
Discrete graphics cards 14 Connected on the PCI-Express bus PCI-Express bus provides <=75W
Discrete graphics cards 15 Connected on the PCI-Express bus PCI-Express bus provides <=75W Measured through an Ultraview PCIeEXT-16HOT expander card
Discrete graphics cards 16 Connected on the PCI-Express bus PCI-Express bus provides <=75W Measured through an Ultraview PCIeEXT-16HOT expander card PCI-Express power connectors
Discrete graphics cards 17 Connected on the PCI-Express bus PCI-Express bus provides 75W Measured through an Ultraview PCIeEXT-16HOT expander card PCI-Express power connectors 8-pin provides <=150W 6-pin provides <=75W
Discrete graphics cards 18
Discrete graphics cards 19 AMD Radeon 7970 (28nm) NVIDIA GeForce GTX 580 (40nm)
SoC #1 20 Intel Sandy Bridge, HD3000 GPU (32nm) Connected on motherboards 4-pin power connector Provides power to CPU, GPU, and parts of the memory system Two runs Rendering pass Idle pass, all gl* calls removed from code
SoC #1 21 Rendering pass - idle pass =? GPU power Parts of the memory power Memory bandwith generated from the graphics workload Not including memory refresh power CPU power for driver execution
SoC #2 22 iphone 4S, PowerVR SGX543MP2 GPU (45nm) Connected on battery connectors Provides power to everything Two runs Rendering pass Idle pass
SoC #2 23 Rendering pass - idle pass =? GPU power Memory power Only for memory bandwidth generated from the graphics workload Not including memory refresh power CPU power for driver execution
SoC #2 24
Results 25 What we measure: High-frequency power data (40kHz) Rendering time per frame
GeForce GTX 580 26 360 360 160 18.748 18.756 Deferred shading 160 29.848 29.86 Variance shadow maps 6 light sources
GeForce GTX 580: Primary rendering 27 Power Forward Pre-Z Deferred 320 300 280 260 240 Power and Frame Rendering time vs Frame Number 220 Power (W) 200 30 Rendering time (ms) 20 10 Frametime Forward Pre-Z Deferred 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0
GeForce GTX 580: Shadows 28 320 300 Power and Frame Rendering time vs Frame Number 2 lights 4 lights 6 lights 8 lights Power Shadow volumes Variance shadow maps Shadow maps Power (W) 280 260 240 220 200 40 30 Rendering time (ms) 20 10 Frametime Shadow volumes Variance shadow maps Shadow maps 0 0 500 1000 1500 2000 2500 0 Frame Number
AMD Radeon 7970: Primary rendering 29 Forward Deferred
AMD Radeon 7970: Primary rendering 30 Forward Deferred
iphone 4S 31 3300 1.3 1.2 1.1 1 3000 2700 2400 Forward Power (W) 0.9 0.8 0.7 0.6 0.5 2100 1800 1500 1200 Rendering time (ms) Pre-Z 0.4 0.3 0.2 0.1 900 600 300 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0
iphone 4S 32 Higher energy than expected Mostly due to long rendering times Probably pushing sort-middle to flush buffers 3300 Forward Pre-Z Power (W) 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 3000 2700 2400 2100 1800 1500 1200 900 600 300 Rendering time (ms) 0 0 200 400 600 800 1000 1200 1400 1600 Frame Number 0
Results 33 What numbers are we interested in?
Results 34 What other numbers are we interested in? Power
Results 35 What other numbers are we interested in? Power Energy
Results 36 What other numbers are we interested in? Power Energy More interesting for battery lifetime
Results 37 What other numbers are we interested in? Power Energy More interesting for battery lifetime But what metric should we use?
Metric 38 nj/pixel
Metric 39 nj/pixel Largely resolution independent
Metric 40 nj/pixel Largely resolution independent Easy to divide frame into segments
Metric 41 nj/pixel Largely resolution independent Easy to divide frame into segments Easy to calculate fillrate for a given TDP
Energy, nj/pixel 42 Primary visibility FR ZR DR GTX 580 1,443 722 511 Radeon 7970 607 512 489 Sandy Bridge 872 314 280 iphone 4S 2,234 2,015 -
Energy, nj/pixel 43 Shadows SV SM VSM GTX 580 1,325 447 532 Radeon 7970 953 469 804 Sandy Bridge 1,317 311 511 iphone 4S - 461 -
Observations 44 Not possible to estimate power / energy only from frame times Surprisingly, pre-z proved useful on a tiled, deferred, sort-middle architecture for our application Pushing too much geometry? Still similar energy per pixel, despite rendering times that differ by an order of magnitude or more nj/pixel
45 Thanks for listening Any questions?