Saving the Planet Designing Low-Power, Low-Bandwidth GPUs Alan Tsai Business Development Manager ARM
Saving the Planet? Really? Photo courtesy of NASA. 2
Mobile GPU design is all about power It s not about the battery It s about heat The power you have now is all you re ever going to get 3
Outline Power and memory bandwidth Three ways to reduce memory bandwidth Tile-based rendering Transaction elimination Advanced texture compression Conclusion 4
Where does the power go? System power Radio, GPS, Wi-Fi, etc. Display Computing Static power Dynamic power Memory bandwidth 5
Does bandwidth matter? Rule of thumb: energy cost of transferring one byte is about 0.15 nj/b * * Please take with several grains of salt. Assumes 2x32 LPDDR2 memory system and includes memory controller, DDR PHY, memory, I/O, and other assumptions. Your mileage may vary. Void where prohibited. 6
Does bandwidth matter? The assumed memory system can deliver 4 to 8 GB/s 4 to 8 GB/s 0.15 nj/b = 600 to1200 mw Yes, bandwidth matters 7
A Trip Down the Graphics Pipeline VERTEX SHADER FRAGMENT TA RAST Z-TEST BLEND SHADER RESOLVE GPU Memory Vertex Buffers Textures MS Z MS C MS C C Observations Data explosion from per-vertex to per-fragment to per-sample Frame buffer BW dominates especially when multi-sampled What to do? Buffers are too big to cache 8
Tile-based rendering pipeline MS Z MS C MS C VERTEX SHADER FRAGMENT TA RAST Z-TEST BLEND SHADER RESOLVE GPU Memory Vertex Buffers Polygon Lists Textures 1 2 3 4 5 6 C Tradeoff Vertices must flow through memory but sample buffers are now on-chip 7 8 9 9
Tile-based rendering is a win! Used in the majority of mobile GPUs Variations ARM Mali Tile-based direct rendering, small tiles Imagination SGX Tile-based deferred rendering, small tiles Qualcomm Adreno Chunk-based direct rendering, large tiles (chunks) 10
What about tile writeback? MS Z MS C MS C GPU Memory VERTEX SHADER FRAGMENT TA RAST Z-TEST BLEND SHADER RESOLVE Vertex Buffers Polygon Lists Textures C Is it really a problem? 11
Screens keep getting bigger Apply our rule of thumb: Screen Resolution Bytes/Pix FPS Power (mw) VGA 640 x 480 2 30 3 WXGA 1280 x 768 4 30 18 FHD 1920 x 1080 4 60 75 MB Pro 2880 x 1800 4 60 187 4K 3840 x 2160 4 60 299 Yes, this is a problem we have to deal with 12
What can we do? Observation: Not everything in the scene changes every frame This is true even in high-end FPS games Skybox HUD Static Scene Elements If a tile hasn t changed, we don t have to write it. 13
Transaction elimination sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig sig Maintain a list of signatures for each tile... sig sig sig sig sig sig sig... Compare to sigs calculated for frame N+1... sig sig sig sig sig sig sig... Where signatures match, don t write the tile Surprisingly effective, even on FPS games and video 14
But how well does it work really? Angry Birds characters and images are copyright 2009-2012, Rovio Entertainment Ltd. Used by permission. 15
But how well does it work really? Save ~ 75% write bandwidth <Video> Save ~ 50% total bandwidth Angry Birds characters and images are copyright 2009-2012, Rovio Entertainment Ltd. Used by permission. 16
Saving the Planet? Photo courtesy of NASA. 17
Saving the Planet? Angry Birds is played ~200 million minutes every day http://thenextweb.com/apps/2011/02/16/angry-birds-gamers-spend-200-million-minutes-playing-each-day / If all these were played using Transaction Elimination 3.1MB of bandwidth would be saved per frame (average) 3.1MB/frame * 60FPS * 60s/min * 200M min/day = 2.2 ExaBytes/day Apply the rule of thumb 2.2ExaBytes * 0.15nJ/Bytes = 335 MJ/day or 34 Megawatt-hours per year 18
What s Next? MS Z MS C MS C VERTEX SHADER FRAGMENT TA RAST Z-TEST BLEND SHADER RESOLVE GPU Vertex Buffers Polygon Lists Textures Texture fetch bandwidth is the biggest remaining problem C The answer is obvious: compression 20
Texture Compression: Problems No universally supported formats Desktop: S3TC, BPTC, RGTC Mobile: ETC1, ATITC, PVRTC, S3TC Limited choice of bit rates and formats: RGB{A}, 4bpp / 8bpp Quality not that great News Flash! ETC2 / EAC are standard in OpenGL ES 3.0, OpenGL 4.3 4bpp R/RGB, 8bpp RG/RGBA Better quality And here comes ASTC in OpenGL ES 3.0 21
Adaptive Scalable Texture Compression Goals Maximum flexibility Quality Functionality Scalable bit rate: from 8bpp down to <1bpp in fine steps Orthogonality: Any number of components at any bit rate Adaptive: # components, pixel format are specified locally Both 2D and 3D textures Both LDR and HDR pixel formats Significant quality improvement 22
Color Formats Codecs Today XY+Z All Major Players ETC, BC5 X+Y RGB+A PVRTC PVRTC ETC, BC2, BC3, BC6(HDR), BC7 RGBA ETC, BC1 BC7 RGB LA ETC, BC4 L 1 2 3 4 5 6 7 8 bits/pixel 23
Color Formats Codecs Today Low Dynamic Range XY+Z X+Y RGB+A RGBA RGB LA L 1 2 3 4 5 6 7 8 bits/pixel 24
Color Formats ASTC Low Dynamic Range XY+Z X+Y RGB+A RGBA RGB LA L 1 2 3 4 5 6 7 8 bits/pixel 25
ASTC Bit Rates Standard block-based paradigm Generalized to 3D Unusually large number of block sizes 2D Bit Rates 3D Bit Rates 4x4 8.00 bpp 10x5 2.56 bpp 3x3x3 4.74 bpp 5x5x4 1.28 bpp 5x4 6.40 bpp 10x6 2.13 bpp 4x3x3 3.56 bpp 5x5x5 1.02 bpp 5x5 5.12 bpp 8x8 2.00 bpp 4x4x3 2.67 bpp 6x5x5 0.85 bpp 6x5 4.27 bpp 10x8 1.60 bpp 4x4x4 2.00 bpp 6x6x5 0.71 bpp 6x6 3.56 bpp 10x10 1.28 bpp 5x4x4 1.60 bpp 6x6x6 0.59 bpp 8x5 3.20 bpp 12x10 1.07 bpp 8x6 2.67 bpp 12x12 0.89 bpp 26
db PSNR Quality Comparison RGB LDR 2bpp Kodak test set 24 natural RGB images PSNR comparison ASTC vs PVRTC 2bpp: 40 38 36 34 32 30 28 26 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Image ASTC 8x8 PVRTC 2bpp 27
Image Comparison ORIGINAL ASTC 6x6: PSNR 33.5 db S3TC 4 bpp: PSNR 30.7 db ASTC at 3.56 bpp vs S3TC at 4 bpp 2.8 db PSNR advantage 11% lower bit rate 28
Summary Mobile GPU design is all about power Memory bandwidth is a key contributor to power Standard and not-so-standard tricks for controlling bandwidth: Tile-based rendering Transaction elimination Advances in texture compression Use Mali GPU to save the planet 29
Thanks!