Introduction to Modern GPU Hardware

Similar documents
CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CS GPU and GPGPU Programming Lecture 8+9: GPU Architecture 7+8. Markus Hadwiger, KAUST

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

General-Purpose Computation on Graphics Hardware

Vertex Shader Design I

Graphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics

CS427 Multicore Architecture and Parallel Computing

GPU Architecture. Michael Doggett Department of Computer Science Lund university

Antonio R. Miele Marco D. Santambrogio

Architectures. Michael Doggett Department of Computer Science Lund University 2009 Tomas Akenine-Möller and Michael Doggett 1

GPU Architecture and Function. Michael Foster and Ian Frasch

From Brook to CUDA. GPU Technology Conference

Graphics Hardware. Instructor Stephen J. Guy

Threading Hardware in G80

Spring 2009 Prof. Hyesoon Kim

Graphics and Imaging Architectures

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Spring 2011 Prof. Hyesoon Kim

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Graphics Processing Unit Architecture (GPU Arch)

Real-Time Rendering Architectures

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Current Trends in Computer Graphics Hardware

Bifrost - The GPU architecture for next five billion

Mattan Erez. The University of Texas at Austin

Vertex Shader Design II

General Purpose Computing on Graphical Processing Units (GPGPU(

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

GPGPU introduction and network applications. PacketShaders, SSLShader

GPU Basics. Introduction to GPU. S. Sundar and M. Panchatcharam. GPU Basics. S. Sundar & M. Panchatcharam. Super Computing GPU.

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

CONSOLE ARCHITECTURE

ECE 574 Cluster Computing Lecture 16

What s New with GPGPU?

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

GPU Computation Strategies & Tricks. Ian Buck NVIDIA

CME 213 S PRING Eric Darve

GPU A rchitectures Architectures Patrick Neill May

GPGPU. Peter Laurens 1st-year PhD Student, NSC

PowerVR Hardware. Architecture Overview for Developers

GPU! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room, Bld 20! 11 December, 2017!

Lecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

1. Introduction. Introduction to Computer Graphics

CSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs

Graphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university

Bringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games

Multimedia in Mobile Phones. Architectures and Trends Lund

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

GPGPU, 1st Meeting Mordechai Butrashvily, CEO GASS

GPUs and GPGPUs. Greg Blanton John T. Lubia

Cornell University CS 569: Interactive Computer Graphics. Introduction. Lecture 1. [John C. Stone, UIUC] NASA. University of Calgary

From Shader Code to a Teraflop: How GPU Shader Cores Work. Jonathan Ragan- Kelley (Slides by Kayvon Fatahalian)

Motivation Hardware Overview Programming model. GPU computing. Part 1: General introduction. Ch. Hoelbling. Wuppertal University

CS GPU and GPGPU Programming Lecture 2: Introduction; GPU Architecture 1. Markus Hadwiger, KAUST

Course Recap + 3D Graphics on Mobile GPUs

Shaders. Slide credit to Prof. Zwicker

Portland State University ECE 588/688. Graphics Processors

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

Advanced Computer Graphics (CS & SE ) Lecture 7

ECE 571 Advanced Microprocessor-Based Design Lecture 20

GPU-Based Volume Rendering of. Unstructured Grids. João L. D. Comba. Fábio F. Bernardon UFRGS

GPGPU on Mobile Devices

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

3D Computer Games Technology and History. Markus Hadwiger VRVis Research Center

Tutorial on GPU Programming #2. Joong-Youn Lee Supercomputing Center, KISTI

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Programming Graphics Hardware

Lecture 7: The Programmable GPU Core. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

The Bifrost GPU architecture and the ARM Mali-G71 GPU

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Scanline Rendering 2 1/42

GRAPHICS PROCESSING UNITS

The NVIDIA GeForce 8800 GPU

What is GPU? CS 590: High Performance Computing. GPU Architectures and CUDA Concepts/Terms

Windowing System on a 3D Pipeline. February 2005

Case 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D

ECE 571 Advanced Microprocessor-Based Design Lecture 18

A Data-Parallel Genealogy: The GPU Family Tree. John Owens University of California, Davis

Monday Morning. Graphics Hardware

Efficient and Scalable Shading for Many Lights

Introduction to CUDA (1 of n*)

Real-time Graphics 9. GPGPU

Power Efficiency for Software Algorithms running on Graphics Processors. Björn Johnsson Per Ganestam Michael Doggett Tomas Akenine-Möller

High Performance Graphics 2010

Programmable GPUs. Real Time Graphics 11/13/2013. Nalu 2004 (NVIDIA Corporation) GeForce 6. Virtua Fighter 1995 (SEGA Corporation) NV1

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

CS 179: GPU Programming

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University

POWERVR MBX & SGX OpenVG Support and Resources

GPU Architecture. Robert Strzodka (MPII), Dominik Göddeke G. TUDo), Dominik Behr (AMD)

Parallelizing Graphics Pipeline Execution (+ Basics of Characterizing a Rendering Workload)

CS 150 Digital Design

CENG 477 Introduction to Computer Graphics. Graphics Hardware and OpenGL

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Evolution of GPUs Chris Seitz

NVidia s GPU Microarchitectures. By Stephen Lucas and Gerald Kotas

GPU ARCHITECTURE Chris Schultz, June 2017

Transcription:

The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This course used only and please do NOT broadcast. Thank you. Introduction to Modern GPU Hardware Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Hsinchu, Taiwan Fall, 2018 1

Outline GPU Pipeline GPU Hardware History GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali GPU Applications Summary 2

GPU Fundamentals: Graphics Pipeline Graphics State Application Vertices (3D) Transform & Light Xformed, Lit Vertices (2D) Assemble Primitives Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Rasterize Shade Video Memory (Textures) CPU GPU Render-to-texture A simplified graphics pipeline Note that pipe widths vary Many caches, FIFOs, and so on not shown

GPU Fundamentals: Modern Graphics Pipeline Xformed, Lit Vertices (2D) Graphics State Transform Vertex Assemble Application Rasterize Fragment Shade Video Processor & Light Primitives Processor Memory (Textures) Vertices (3D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) CPU GPU Render-to-texture Programmable vertex processor! Programmable pixel processor!

GPU Fundamentals: Modern Graphics Pipeline Graphics State Application Vertices (3D) Vertex Processor Xformed, Lit Vertices (2D) Assemble Geometry Primitives Processor Screenspace triangles (2D) Rasterize Fragments (pre-pixels) Fragment Processor Final Pixels (Color, Depth) Video Memory (Textures) CPU GPU Render-to-texture Programmable primitive assembly! More flexible memory access!

History of Graphics Hardware (1/3) - mid 90s SGI mainframes and workstations PC: only 2D graphics hardware mid 90s Consumer 3D graphics hardware (PC) - 3dfx, NVIDIA, Matrox, ATI, Triangle rasterization (only) Cheap: pushed by game industry 1999 PC-card with TnL (Transform and Lighting) 3DFX Voodoo graphics 4MB - 1997 - NVIDIA GeForce: Graphics Processing Unit (GPU) PC-card more powerful than specialized workstations 6

History of Graphics Hardware (2/3) https://www.zhihu.com/question/21980949

History of Graphics Hardware Modern graphics hardware (3/3) Graphics pipeline partly programmable Leaders: AMD(ATI) and NVIDIA - AMD Radeon HD 6990 and NVIDIA GeForce GTX 590 Game consoles similar to GPUs (Xbox) 8

Computational Power (1/2) GPUs are fast 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160): Computation: 48 GFLOPS peak Memory bandwidth: 21 GB/s peak Price: $874 (chip) NVIDIA GeForce 8800 GTX: Computation: 330 GFLOPS observed Memory bandwidth: 55.2 GB/s observed Price: $599 (board) GPUs are getting faster, faster CPUs: 1.4 annual growth GPUs: 1.7 (pixels) to 2.3 (vertices) annual growth

Computational Power (2/2) GPU CPU Courtesy Naga Govindaraju

Flops Comparison on GPU and CPU

Memory Bandwidths Comparison of CPU and GPU

Motivation Why are GPUs getting faster so fast? Arithmetic intensity the specialized nature of GPUs makes it easier to use additional transistors for computation Economics multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property

Flexible and Precise Modern GPUs are deeply programmable Programmable pixel, vertex, and geometry engines Solid high-level language support Modern GPUs support real precision 32-bit/64-bit floating point throughout the pipeline High enough for many applications DX10-class GPUs add 32-bit integers

Graphics Hardware Consideration (1/2) GPU = Graphics Processing Unit Vector processor Operates on 4 tuples Position ( x, y, z, w ) Color ( red, green, blue, alpha ) Texture Coordinates ( s, t, r, q ) 4 tuple ops, 1 clock cycle SIMD [ Single Instruction Multiple Data ] ADD, MUL, SUB, DIV, MADD,

Graphics Hardware Consideration (2/2) Pipelining Number of stages Parallelism Number of parallel processes 1 2 3 1 2 3 Parallelism + pipelining Number of parallel pipelines 1 2 3 1 2 3 1 2 3

Outline GPU Pipeline History of GPU Hardware GPU Hardware Consideration Modern GPU Hardware Architecture NVIDIA GeForce AMD (ATI) Radeon IMG PowerVR ARM Mali Summary 17

Growth of NVIDIA GPU Performance matrices Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.

Growth of NVIDIA GPU

NVIDIA GeForce 7900 GTX

Nvidia Graphics Card Architecture GeForce-8 Series 12,288 concurrent threads, hardware managed 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak Host CPU Work Distribution IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU IU SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory Shared Memory TF TF TF TF TF TF TF TF TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 TEX L1 L2 L2 L2 L2 L2 L2 Memory Memory Memory Memory Memory Memory

NVIDIA FERMI

FERMI: Streaming Multiprocessor (SM) Each SM contains 32 Cores 16 Load/Store units 32,768 registers Newer FP representation IEEE 754-2008 Two units Floating point Integer

FERMI: Results

FERMI: Comparison

Kepler: Core Architecture http://www.weistang.com/article-941-1.html

Maxwell: Core Architecture http://www.weistang.com/article-941-1.html http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A B%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B

Kepler vs Maxwell Comparison 2012 2014 http://www.coolaler.com/showthread.php/313295- %E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA- Maxwell%E6%9E%B6%E6%A7%8B

Pascal: Core Architecture https://read01.com/zh-tw/oemme4.html#.wi5f30qwyps

Volta: Core Architecture http://technews.tw/2017/05/11/nvidia-gpu-volta/

Pascal vs Volta Comparison 2016 2017 http://technews.tw/2017/05/11/nvidia-gpu-volta/

https://zh.wikipedia.org/wiki/cuda 09/02/11

NVIDIA ULP-Geforce (Tegra2) 33

NVIDIA ULP-Geforce (Tegra3) 34

Tegra Roadmap 09/02/11

Mobile Roadmap http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-tabledrawing-tablet?page=2 09/02/11

ATI Radeon X1900 XTX Features of ATI Radeon X1900 XTX Core speed 650 MHz 48 pixel shader processors 8 vertex shader processors 51 GB/s memory bandwidth 512 MB memory http://product.pcpop.com/000024721/index.html

Parallel Processes ATI Radeon X1900 XTX High Memory Bandwidth GPU 650MHz High bandwidth 51GB/s Graphics memory ½ GB Graphics Card Output CPU 3GHz High bandwidth 77GB/s Processor Chip Cache ½ MB 3GB/s AGP bus 2GB/s AGP memory ½ GB Main memory 1GB

ATI Radeon 9700 Parallelism + pipelining: ATI Radeon 9700 4 vertex pipelines 8 pixel pipelines

Radeon Comparison http://www.pcdiy.com.tw/detail/4275 09/02/11

IMG PowerVR Series5XT (SGXMP) 41

IMG PowerVR Series5XT (SGXMP) Shader-driven Tile-Based Deferred Rendering (TBDR) architecture Fully programmable GPU using unique USSE architecture All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1 42

IMG PowerVR Series6 (Rogue) 43

IMG PowerVR Series6 (Rogue) Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality 44

IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/7130 45

IMG PowerVR 7XT Plus http://imgtec.eetrend.com/article/7130 46

Features of ARM Mali 47

ARM Mali-200 48

ARM Mali-300 49

ARM Mali-400MP 50

ARM Mali-450MP 51

ARM Mali-T604 52

ARM Mali-T604 GPGPU (support OpenCL 1.1) Tri-pipe architecture The first GPU based on the Midgard architecture True IEEE double-precision floating-point math in hardware for Full Profile The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU 5x performance improvement over previous Mali graphics processors. 53

ARM Mali-T624 54 9/10/2018

ARM Mali-T678 55

ARM Mali-T678 50% performance improvement compared to the Mali- T658. 56

ARM Mali-T760 57

ARM Mali-T880 58

ARM Mali Comparison https://zh.wikipedia.org/wiki/mali_(gpu) 59

ARM Mali Comparison https://zh.wikipedia.org/wiki/mali_(gpu) 60

Applications (1/7) Includes lots of applications Ray-tracer Image segmentation FFT/Linear Algebra http://f.fwallpapers.com/images/3d -bunny.jpg http://graphics.stanford.edu/data/3ds canrep/stanford-bunny-cebal-ssh.jpg

Applications (2/7) http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-keplerinto-the-tablet-market-discarded-palm-machine-changes-to-core-login-tabledrawing-tablet?page=2 09/02/11

Applications (3/7) http://5pit.tw/tech/computer/tid_12880

Applications (4/7) http://wechatinchina.com/thread-461154-1-1.html 09/02/11

Applications (5/7) https://read01.com/pnd3d.html 09/02/11

Applications (6/7) AR and VR Applications @@ http://wechatinchina.com/thread-461154-1-1.html 09/02/11

Applications (7/7) http://www.naipo.com/portals/1/web_tw/knowledge_center/industry_e conomy/publish-482.htm 09/02/11

GPU Solve ALL Problems?

GPU Solve ALL Problems?

Summary Understand the GPU pipeline in depth Understand the motivation of of GPU hardware Understand modern GPU hardware architecture and specifications Understand GPU/GPGPU applications and key problems 70

Reference GPU Architecture & CG, Mark Colbert, 2006 Introduction to Graphics Hardware and GPUs, Yannick Francken, Tom Mertens GPU Tutorial, Yiyunjin, 2007 Evolution of GPU and Graphics Pipelining, Weijun Xiao Commercial product website (NVIDIA, ATI, IMG, ARM). Referencing SIGGRAPH 2005 Course Notes from David Luebke Adapted from: David Luebke (University of Virginia) and NVIDIA Jan Verschelde, MCS 572 Lecture 27, Introduction to Supercomputing, 17 March 2014 Acknowledgement: Thanks for TA s help for preparing the material. 71