The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only and please do NOT broadcast. Thank you. Introduction to Modern GPU Hardware Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2018 1
Outline GPU Pipeline GPU Hardware History GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali GPU Applications Summary 2
GPU Fundamentals: Graphics Pipeline Graphics State Application Vertices (3D) Transform & Light Xformed, Lit Vertices (2D) Assemble Primitives Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Rasterize Shade Video Memory (Textures) CPU GPU Render-to-texture A simplified graphics pipeline Note that pipe widths vary Many caches, FIFOs, and so on not shown
GPU Fundamentals: Modern Graphics Pipeline Xformed, Lit Vertices (2D) Graphics State Transform Vertex Assemble Application Rasterize Fragment Shade Video Processor & Light Primitives Processor Memory (Textures) Vertices (3D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) CPU GPU Render-to-texture Programmable vertex processor! Programmable pixel processor!
GPU Fundamentals: Modern Graphics Pipeline Graphics State Application Vertices (3D) Vertex Processor Xformed, Lit Vertices (2D) Assemble Geometry Primitives Processor Screenspace triangles (2D) Rasterize Fragments (pre-pixels) Fragment Processor Final Pixels (Color, Depth) Video Memory (Textures) CPU GPU Render-to-texture Programmable primitive assembly! More flexible memory access!
History of Graphics Hardware (1/3) - mid 90s SGI mainframes and workstations PC: only 2D graphics hardware mid 90s Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, Matrox, ATI, Triangle rasterization (only) Cheap: pushed by game industry 1999 PC-card with TnL (Transform and Lighting) 3DFX Voodoo graphics 4MB - 1997 - NVIDIA GeForce: Graphics Processing Unit (GPU) PC-card more powerful than specialized workstations 6
History of Graphics Hardware (2/3) https://www.zhihu.com/question/21980949
History of Graphics Hardware Modern graphics hardware (3/3) Graphics pipeline partly programmable Leaders: AMD(ATI) and NVIDIA - AMD Radeon HD 6990 and NVIDIA GeForce GTX 590 Game consoles similar to GPUs (Xbox) 8
Computational Power (1/2) GPUs are fast 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): Computation: 48 GFLOPS peak Memory bandwidth: 21 GB/s peak Price: $874 (chip) NVIDIA GeForce 8800 GTX: Computation: 330 GFLOPS observed Memory bandwidth: 55.2 GB/s observed Price: $599 (board) GPUs are getting faster, faster CPUs: 1.4 annual growth GPUs: 1.7 (pixels) to 2.3 (vertices) annual growth
Computational Power (2/2) GPU CPU Courtesy Naga Govindaraju
Flops Comparison on GPU and CPU
Memory Bandwidths Comparison of CPU and GPU
Motivation Why are GPUs getting faster so fast? Arithmetic intensity the specialized nature of GPUs makes it easier to use additional transistors for computation Economics multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property
Flexible and Precise Modern GPUs are deeply programmable Programmable pixel, vertex, and geometry engines Solid high-level language support Modern GPUs support real precision 32-bit/64-bit floating point throughout the pipeline High enough for many applications DX10-class GPUs add 32-bit integers
Graphics Hardware Consideration (1/2) GPU = Graphics Processing Unit Vector processor Operates on 4 tuples Position ( x, y, z, w ) Color ( red, green, blue, alpha ) Texture Coordinates ( s, t, r, q ) 4 tuple ops, 1 clock cycle SIMD [ Single Instruction Multiple Data ] ADD, MUL, SUB, DIV, MADD,
Graphics Hardware Consideration (2/2) Pipelining Number of stages Parallelism Number of parallel processes 1 2 3 1 2 3 Parallelism + pipelining Number of parallel pipelines 1 2 3 1 2 3 1 2 3
Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali Summary 17
Growth of NVIDIA GPU Performance matrices Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.
Growth of NVIDIA GPU
NVIDIA GeForce 7900 GTX
Nvidia Graphics Card Architecture GeForce-8 Series 12,288 concurrent threads, hardware managed 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak Host CPU Work Distribution IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory TF TF TF TF TF TF TF TF TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 L2 L2 L2 L2 L2 L2 Memory Memory Memory Memory Memory Memory
NVIDIA FERMI
FERMI: Streaming Multiprocessor (SM) Each SM contains 32 Cores 16 Load/Store units 32,768 registers Newer FP representation IEEE 754-2008 Two units Floating point Integer
FERMI: Results
FERMI: Comparison
Kepler: Core Architecture http://www.weistang.com/article-941-1.html
Maxwell: Core Architecture http://www.weistang.com/article-941-1.html http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B
Kepler vs Maxwell Comparison 2012 2014 http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B
Pascal: Core Architecture https://read01.com/zh-tw/oemme4.html#.wi5f30qwyps
Volta: Core Architecture http://technews.tw/2017/05/11/nvidia-gpu-volta/
Pascal vs Volta Comparison 2016 2017 http://technews.tw/2017/05/11/nvidia-gpu-volta/
https://zh.wikipedia.org/wiki/cuda 09/02/11
NVIDIA ULP-Geforce (Tegra2) 33
NVIDIA ULP-Geforce (Tegra3) 34
Tegra Roadmap 09/02/11
Mobile Roadmap http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-tabledrawing-tablet?page=2 09/02/11
ATI Radeon X1900 XTX Features of ATI Radeon X1900 XTX Core speed 650 MHz 48 pixel shader processors 8 vertex shader processors 51 GB/s memory bandwidth 512 MB memory http://product.pcpop.com/000024721/index.html
Parallel Processes ATI Radeon X1900 XTX High Memory Bandwidth GPU 650MHz High bandwidth 51GB/s Graphics memory ½ GB Graphics Card Output CPU 3GHz High bandwidth 77GB/s Processor Chip Cache ½ MB 3GB/s AGP bus 2GB/s AGP memory ½ GB Main memory 1GB
ATI Radeon 9700 Parallelism + pipelining: ATI Radeon 9700 4 vertex pipelines 8 pixel pipelines
Radeon Comparison http://www.pcdiy.com.tw/detail/4275 09/02/11
IMG PowerVR Series5XT (SGXMP) 41
IMG PowerVR Series5XT (SGXMP) Shader-driven Tile-Based Deferred Rendering (TBDR) architecture Fully programmable GPU using unique USSE architecture All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 42
IMG PowerVR Series6 (Rogue) 43
IMG PowerVR Series6 (Rogue) Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality 44
IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/7130 45
IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/7130 46
Features of ARM Mali 47
ARM Mali-200 48
ARM Mali-300 49
ARM Mali-400MP 50
ARM Mali-450MP 51
ARM Mali-T604 52
ARM Mali-T604 GPGPU (support OpenCL 1.1) Tri-pipe architecture The first GPU based on the Midgard architecture True IEEE double-precision floating-point math in hardware for Full Profile The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU 5x performance improvement over previous Mali graphics processors. 53
ARM Mali-T624 54 9/10/2018
ARM Mali-T678 55
ARM Mali-T678 50% performance improvement compared to the Mali- T658. 56
ARM Mali-T760 57
ARM Mali-T880 58
ARM Mali Comparison https://zh.wikipedia.org/wiki/mali_(gpu) 59
ARM Mali Comparison https://zh.wikipedia.org/wiki/mali_(gpu) 60
Applications (1/7) Includes lots of applications Ray-tracer Image segmentation FFT/Linear Algebra http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3ds canrep/stanford-bunny-cebal-ssh.jpg
Applications (2/7) http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-tabledrawing-tablet?page=2 09/02/11
Applications (3/7) http://5pit.tw/tech/computer/tid_12880
Applications (4/7) http://wechatinchina.com/thread-461154-1-1.html 09/02/11
Applications (5/7) https://read01.com/pnd3d.html 09/02/11
Applications (6/7) AR and VR Applications @@ http://wechatinchina.com/thread-461154-1-1.html 09/02/11
Applications (7/7) http://www.naipo.com/portals/1/web_tw/knowledge_center/industry_e conomy/publish-482.htm 09/02/11
GPU Solve ALL Problems?
GPU Solve ALL Problems?
Summary Understand the GPU pipeline in depth Understand the motivation of of GPU hardware Understand modern GPU hardware architecture and specifications Understand GPU/GPGPU applications and key problems 70
Reference GPU Architecture & CG, Mark Colbert, 2006 Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens GPU Tutorial, Yiyunjin, 2007 Evolution of GPU and Graphics Pipelining, Weijun Xiao Commercial product website (NVIDIA, ATI, IMG, ARM). Referencing SIGGRAPH 2005 Course Notes from David Luebke Adapted from: David Luebke (University of Virginia) and NVIDIA Jan Verschelde, MCS 572 Lecture 27, Introduction to Supercomputing, 17 March 2014 Acknowledgement: Thanks for TA s help for preparing the material. 71