Bifrost - The GPU architecture for next five billion
|
|
- Godwin Shaw
- 6 years ago
- Views:
Transcription
1 Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016
2 Vulkan 2 ARM 2016
3 What is Vulkan? A 3D graphics API for the next twenty years Logical successor to OpenGL and OpenGL ES Modern, efficient design Open, industry-controlled standard Here, now Released in February, with unprecedented support Available today for desktop Windows and Linux Officially supported in Android N Shipping today in Samsung Galaxy S7 Engaged, active developer community 3
4 Why ARM loves Vulkan A great fit for mobile graphics architectures! No wasted effort trying to look like a desktop GPU Designed to enable mobile-specific optimizations Radical commitment to efficiency CPU load is greatly reduced, even on a single core Makes your multi-core CPU more useful! Driver work can be distributed across many threads This helps performance and power Makes your multi-core GPU more useful too Easier for applications to keep a powerful GPU busy 4
5 Bifrost 5 ARM 2016
6 Bifrost: The new GPU architecture The increasing pixel impact of modern mobile gaming continues to drive innovation 2016: Bifrost 2010: Utgard 2013: Midgard 6
7 ARM Mali processor generations BIFROST Mali-G71 GPU Unified shader cores, scalar ISA, clause execution, full coherency, Vulkan, OpenCL MIDGARD Mali-T600 GPU series Mali-T700 GPU series Mali-T800 GPU series Unified shader cores, SIMD ISA, OpenGL ES 3.x, OpenCL, Vulkan UTGARD Mali-200 GPU Mali-300 GPU Mali-400 GPU Mali-450 GPU Mali-470 GPU Separate shader cores, SIMD ISA, OpenGL ES 2.x 7
8 Mali-G71 efficiency drives performance 20% Higher energy efficiency* 32 cores 40% Better performance density* 20% Bandwidth Improvement* Optimized for next generation, advanced, real-world content *Compared to Mali-T880, on same process node under the same conditions. 8
9 Bifrost features A more efficient architecture: More performance overall, per mm 2 and per line of real world shader code Major shader core redesign New scalar, clause-based ISA New quad-based arithmetic units New core fabric New geometry data flow Reduces memory bandwidth and footprint 1.5x Performance improvement 9
10 Architectural innovations 10 ARM 2016
11 Bifrost architectural innovations Energy efficiency Claused shaders Index Driven Vertex Shading Wire light pipelines Developer friendly Designed for Vulkan and VR/AR Heterogeneous computing Full system coherency Midgard Bifrost CPU CPU GPU Coherent Interconnect DRAM 11
12 Bifrost GPU design Driver Software Job Manager Core 0 Core 1 Core 2 Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 12
13 Scalable system design Driver Software Up to 32 shader cores supported Job Manager Core 0 Core 1 Core 2 Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 13
14 Execution core improvements Driver Software Job Manager Core 0 Core 1 Core 2 Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 14
15 Bifrost core design 15 ARM 2016
16 ZS Memory Bifrost core design Compute Frontend Fragment Frontend Quad Creator Quad Creator Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Control Quad Manager Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 16
17 Lane 0 Lane 1 Lane 2 Lane 3 Quad vectorization Bifrost uses quad-parallel execution Four scalar threads executed in lockstep in a quad One quad at a time executes in each pipeline stage Each thread fills one 32-bit lane of the hardware 4 threads doing a vec3 FP32 add takes 3 cycles Improves utilization T0.x T1.x T2.x T0.y T1.y T2.y T0.z T1.x T2.z T3.x T3.y T3.z Cycle 1 Cycle 2 Cycle 3 Quad vectorization is compiler friendly Each thread only sees a stream of scalar operations Vector operations can always be split into scalars 17
18 Clause execution Back-to-back execution guaranteed within a clause Allows aggressive optimisation Overhead Instruction 18
19 Clause execution R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 ADD R2, R0, R1 ADD R4, R2, R3 ADD R0, R4, R5 Back-to-back register access is common The result from one instruction is often only used as input to the next 19
20 Clause execution R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 R0 R1 R2 R3 R4 R5 R6 R7 ADD T, R0, R1 T ADD T, T, R3 T ADD R0, T, R5 Back-to-back register access is common Register file bypass saves power. Allows use of simpler, smaller register files. 20
21 Clause scheduling TEX Unrelated? Required data not ready? Use result Texture unit operation Delay next clause if asynchronous data not ready Overhead Instruction 21
22 Clause scheduling? Use result TEX Texture unit operation Another quad can use this execution unit High utilization, high efficiency Overhead Quad 1 Quad 2 22
23 Arithmetic functional units 23 ARM 2016
24 Temp Registers Bifrost arithmetic functional units Executes quad-parallel scalar operations 4x32-bit multiplier FMA 4x32-bit adder ADD Adder includes special function unit Smaller and more area efficient Simplified layout eases compilation Better scheduling in today s code Better utilization One instruction word contains two instructions Main Regs Read FMA ADD/SF Main Regs Write 24
25 Temp Registers Bifrost arithmetic functional units Retains support for smaller width data types Integers useful for deep learning 2x performance for FP16 useful for pixel shaders Main Regs Read int8 int8 int8 int8 8-bit integers int16 int16 16-bit integers int32 32-bit integers FMA float16 float16 16-bit floating point float32 32-bit floating point ADD/SF Main Regs Write 25
26 Temp Registers Special arithmetic operations Special function hardware is smaller than Midgard equivalent Many transcendental functions supported Special functions provide building blocks for compiled shader code Part of the built-in function libraries Main Regs Read FMA ADD/SF Main Regs Write 26
27 Load/store units 27 ARM 2016
28 ZS Memory New core design Compute Frontend Fragment Frontend Execution Engine 0 Execution Engine 1 Execution Engine 2 Quad State Quad State Quad State Quad Creator Quad Manager Quad Creator Control Fabric Load/store Unit Attribute Unit Varying Unit Texture Unit Blender & Tile Access Depth & Stencil To L2 Mem Sys To L2 Mem Sys Tile Memory Tile Writeback To L2 Mem Sys 28
29 Bifrost load/store units Separate units, scheduled separately, for better utilization Load/store Unit Attribute Unit Varying Unit Handles most general memory accesses Includes memory address translation and coherent caching Handles attribute indexing and addressing Defers to load/store for actual memory access Handles varying interpolation Lower power, but more range and precision than Midgard 29
30 Tiler 30 ARM 2016
31 Geometry flow improvement Driver Software Job Manager Core 0 Core 1 Core 2 Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus 31
32 Geometry flow: Midgard Processing Read+Write Bandwidth [x times of storage size] Vertex Shading Tiling Fragment Shading x 3.5x 2.5x Positions Attribs Memory 1x Positions Attribs Trans. Positions Varyings Indices Polygon List 1x 1 Leading data stream at the numbered stage Bandwidth used relative to memory storage size 32
33 0.5x 0.5x Geometry flow: Bifrost - index-driven vertex shading Processing Read+Write Bandwidth [x times of storage size] Position Shading Tiling Varying Shading Fragment Shading x 2.0x 2.5x 1.5x Positions Attribs Memory Indices Positions Trans. Positions Polygon List Attribs Varyings 1x 1 Leading data stream at the numbered stage Bandwidth used relative to memory storage size 33
34 Memory system 34 ARM 2016
35 Memory system Driver Software Job Manager Core 0 Core 1 Core 2 Core 31 Control Fabric Tiler MMU L2 Cache Segment L2 Cache Segment L2 Cache Segment AXI Memory Bus AXI Memory Bus AXI Memory Bus Full coherency using ACE protocol 35
36 Memory system Full system coherency support Supports tightly coupled CPU+GPU use cases Cortex-A73 CPU Mali-G71 GPU L2 cache improvements Single logical L2 cache makes software easier Fewer partial lines written to AXI which improves LPDDR4 performance CoreLink CCI-550 DMC-500 DRAM 36
37 The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright 2016 ARM Limited
The Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationDeveloping the Bifrost GPU architecture for mainstream graphics
Developing the Bifrost GPU architecture for mainstream graphics Anand Patel Senior Product Manager, Media Processing Group ARM Tech Symposia India December 7 th 2016 Graphics processing drivers Virtual
More informationHardware- Software Co-design at Arm GPUs
Hardware- Software Co-design at Arm GPUs Johan Grönqvist MCC 2017 - Uppsala About Arm Arm Mali GPUs: The World s #1 Shipping Graphics Processor 151 Total Mali licenses 21 Mali video and display licenses
More informationAchieving Console Quality Games on Mobile
Achieving Console Quality Games on Mobile Peter Harris, Senior Principal Engineer, ARM Unai Landa, CTO, Digital Legends Jon Kirkham, Staff Engineer, ARM GDC 2017 Agenda Premium smartphone in 2017 ARM Cortex
More informationMali-G72 Enabling tomorrow s technology today
Mali-G72 Enabling tomorrow s technology today Alan Tsai Senior Regional Marketing Manager Media Processing Group, ARM ARM Tech Forum Taipei July 4 th 2017 Mali High Performance GPU success 2 Mali-G71 in
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More information3D Graphics in Future Mobile Devices. Steve Steele, ARM
3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and
More informationMali-G72: Enabling tomorrow s technology today
Mali-G72: Enabling tomorrow s technology today Ploutarchos Galatsopoulos Senior Product Manager Media Processing Group, ARM ARM Tech Forum Korea June 28 th 2017 ARM Mali: The world s #1 shipping GPU ~50%
More informationNext Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1
Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon
More informationCopyright Khronos Group Page 1. Vulkan Overview. June 2015
Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationUnreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture
Unreal Engine 4: Mobile Graphics on ARM CPU and GPU Architecture Ray Hwang, Segment Marketing Manager, ARM Niklas Smedberg, Senior Engine Programmer, Epic Games Hessed Choi, Senior Field Applications Engineer,
More informationAntonio R. Miele Marco D. Santambrogio
Advanced Topics on Heterogeneous System Architectures GPU Politecnico di Milano Seminar Room A. Alario 18 November, 2015 Antonio R. Miele Marco D. Santambrogio Politecnico di Milano 2 Introduction First
More informationVulkan Multipass mobile deferred done right
Vulkan Multipass mobile deferred done right Hans-Kristian Arntzen Marius Bjørge Khronos 5 / 25 / 2017 Content What is multipass? What multipass allows... A driver to do versus MRT Developers to do Transient
More informationARM Multimedia IP: working together to drive down system power and bandwidth
ARM Multimedia IP: working together to drive down system power and bandwidth Speaker: Robert Kong ARM China FAE Author: Sean Ellis ARM Architect 1 Agenda System power overview Bandwidth, bandwidth, bandwidth!
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationCase 1:17-cv SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C
Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 1 of 33 PageID #: 60 EXHIBIT C Case 1:17-cv-00064-SLR Document 1-3 Filed 01/23/17 Page 2 of 33 PageID #: 61 U.S. Patent No. 7,633,506 VIZIO / Sigma
More informationMali Developer Resources. Kevin Ho ARM Taiwan FAE
Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture
More informationARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 3.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C (ID102813)
ARM Mali GPU Version: 3.0 OpenGL ES Application Optimization Guide Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555C () ARM Mali GPU OpenGL ES Application Optimization Guide Copyright 2011,
More informationDave Shreiner, ARM March 2009
4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline
More informationCourse Recap + 3D Graphics on Mobile GPUs
Lecture 18: Course Recap + 3D Graphics on Mobile GPUs Interactive Computer Graphics Q. What is a big concern in mobile computing? A. Power Two reasons to save power Run at higher performance for a fixed
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationPowerVR Hardware. Architecture Overview for Developers
Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationEnabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager
Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the
More informationIntegrating CPU and GPU, The ARM Methodology. Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM
Integrating CPU and GPU, The ARM Methodology Edvard Sørgård, Senior Principal Graphics Architect, ARM Ian Rickards, Senior Product Manager, ARM The ARM Business Model Global leader in the development of
More informationMobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair
OpenGL ES in the Mobile Graphics Ecosystem Tom Olson OpenGL ES working group chair Director, Graphics Research, ARM Ltd 1 Outline Why Mobile Graphics? OpenGL ES Overview Getting Started with OpenGL ES
More informationLPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)
A practitioner s view of challenges faced with power and performance on mobile GPU Prashant Sharma Samsung R&D Institute UK LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014) SERI
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationVulkan on Mobile. Daniele Di Donato, ARM GDC 2016
Vulkan on Mobile Daniele Di Donato, ARM GDC 2016 Outline Vulkan main features Mapping Vulkan Key features to ARM CPUs Mapping Vulkan Key features to ARM Mali GPUs 4 Vulkan Good match for mobile and tiling
More informationCopyright Khronos Group Page 1
Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright
More informationEfficient and Scalable Shading for Many Lights
Efficient and Scalable Shading for Many Lights 1. GPU Overview 2. Shading recap 3. Forward Shading 4. Deferred Shading 5. Tiled Deferred Shading 6. And more! First GPU Shaders Unified Shaders CUDA OpenCL
More informationOptimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June
Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge
More informationTake GPU Processing Power Beyond Graphics with Mali GPU Computing
Take GPU Processing Power Beyond Graphics with Mali GPU Computing Roberto Mijat Visual Computing Marketing Manager August 2012 Introduction Modern processor and SoC architectures endorse parallelism as
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationEE382N (20): Computer Architecture - Parallelism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez. The University of Texas at Austin
EE382 (20): Computer Architecture - ism and Locality Spring 2015 Lecture 09 GPUs (II) Mattan Erez The University of Texas at Austin 1 Recap 2 Streaming model 1. Use many slimmed down cores to run in parallel
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationEnable AI on Mobile Devices
Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality
More informationPortland State University ECE 588/688. Graphics Processors
Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly
More informationThe Benefits of GPU Compute on ARM Mali GPUs
The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >
More informationSpring 2009 Prof. Hyesoon Kim
Spring 2009 Prof. Hyesoon Kim Application Geometry Rasterizer CPU Each stage cane be also pipelined The slowest of the pipeline stage determines the rendering speed. Frames per second (fps) Executes on
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationAchieving High-performance Graphics on Mobile With the Vulkan API
Achieving High-performance Graphics on Mobile With the Vulkan API Marius Bjørge Graphics Research Engineer GDC 2016 Agenda Overview Command Buffers Synchronization Memory Shaders and Pipelines Descriptor
More informationInside VR on Mobile. Sam Martin Graphics Architect GDC 2016
Inside VR on Mobile Sam Martin Graphics Architect GDC 2016 VR Today Emerging technology Main mobile VR ecosystems Google Cardboard Samsung GearVR In this talk: Latency Multiple views Performance tuning
More informationVulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.
Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies
More informationTEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems
International Conference on Supercomputing June 2013 TEAPOT: A Toolset for Evaluating Performance, Power and Image Quality on Mobile Graphics Systems Joan-Manuel Parcerisa Polychronis Xekalakis Computer
More informationThe NVIDIA GeForce 8800 GPU
The NVIDIA GeForce 8800 GPU August 2007 Erik Lindholm / Stuart Oberman Outline GeForce 8800 Architecture Overview Streaming Processor Array Streaming Multiprocessor Texture ROP: Raster Operation Pipeline
More informationGraphics Architectures and OpenCL. Michael Doggett Department of Computer Science Lund university
Graphics Architectures and OpenCL Michael Doggett Department of Computer Science Lund university Overview Parallelism Radeon 5870 Tiled Graphics Architectures Important when Memory and Bandwidth limited
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationChallenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008
Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated
More informationMali-400 MP: A Scalable GPU for Mobile Devices Tom Olson
Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling
More informationCase 1:17-cv SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D
Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 1 of 30 PageID #: 75 EXHIBIT D Case 1:17-cv-00065-SLR Document 1-4 Filed 01/23/17 Page 2 of 30 PageID #: 76 U.S. Patent No. 7,633,506 LG / MediaTek
More informationSIGGRAPH Briefing August 2014
Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances
More informationThreading Hardware in G80
ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &
More informationCSCI-GA Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs
CSCI-GA.3033-004 Graphics Processing Units (GPUs): Architecture and Programming Lecture 2: Hardware Perspective of GPUs Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com History of GPUs
More informationNVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield
NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host
More informationNew ARMv8-R technology for real-time control in safetyrelated
New ARMv8-R technology for real-time control in safetyrelated applications James Scobie Product manager ARM Technical Symposium China: Automotive, Industrial & Functional Safety October 31 st 2016 November
More informationSpring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett
Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens
More informationGeForce4. John Montrym Henry Moreton
GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationEECS 487: Interactive Computer Graphics
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with
More informationPowerVR Performance Recommendations. The Golden Rules
PowerVR Performance Recommendations Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind. Redistribution
More informationFrom Brook to CUDA. GPU Technology Conference
From Brook to CUDA GPU Technology Conference A 50 Second Tutorial on GPU Programming by Ian Buck Adding two vectors in C is pretty easy for (i=0; i
More informationAnalyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding
More informationUnleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM
Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer
More informationModeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces
Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation
More informationBifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm
XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel
More informationGraphics Processing Unit Architecture (GPU Arch)
Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationA SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization
A SIMD-efficient 14 Instruction Shader Program for High-Throughput Microtriangle Rasterization Jordi Roca Victor Moya Carlos Gonzalez Vicente Escandell Albert Murciego Agustin Fernandez, Computer Architecture
More informationKhronos Connects Software to Silicon
Press Pre-Briefing GDC 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem All Materials Embargoed Until Tuesday 3 rd March, 12:01AM Pacific Time Copyright Khronos Group 2015 - Page
More informationComprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications
Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei Machine Learning is a Subset of Artificial
More informationEvolving IP configurability and the need for intelligent IP configuration
Evolving IP configurability and the need for intelligent IP configuration Mayank Sharma Product Manager ARM Tech Symposia India December 7 th 2016 Increasing IP integration costs per node $140 $120 $M
More informationGPU Architecture. Michael Doggett Department of Computer Science Lund university
GPU Architecture Michael Doggett Department of Computer Science Lund university GPUs from my time at ATI R200 Xbox360 GPU R630 R610 R770 Let s start at the beginning... Graphics Hardware before GPUs 1970s
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationBringing AAA graphics to mobile platforms. Niklas Smedberg Senior Engine Programmer, Epic Games
Bringing AAA graphics to mobile platforms Niklas Smedberg Senior Engine Programmer, Epic Games Who Am I A.k.a. Smedis Platform team at Epic Games Unreal Engine 15 years in the industry 30 years of programming
More informationGPGPU on Mobile Devices
GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI
More informationAnatomy of AMD s TeraScale Graphics Engine
Anatomy of AMD s TeraScale Graphics Engine Mike Houston Design Goals Focus on Efficiency f(perf/watt, Perf/$) Scale up processing power and AA performance Target >2x previous generation Enhance stream
More informationIntroduction to Modern GPU Hardware
The following content are extracted from the material in the references on last page. If any wrong citation or reference missing, please contact ldvan@cs.nctu.edu.tw. I will correct the error asap. This
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationPowerVR Graphics - Latest Developments and Future Plans
PowerVR Graphics - Latest Developments and Future Plans Latest Developments and Future Plans A brief introduction Joe Davis Lead Developer Support Engineer, PowerVR Graphics With Imagination s PowerVR
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationGPGPU on ARM. Tom Gall, Gil Pitney, 30 th Oct 2013
GPGPU on ARM Tom Gall, Gil Pitney, 30 th Oct 2013 Session Description This session will discuss the current state of the art of GPGPU technologies on ARM SoC systems. What standards are there? Where are
More informationARM. Mali GPU. OpenGL ES Application Optimization Guide. Version: 2.0. Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B (ID051413)
ARM Mali GPU Version: 2.0 OpenGL ES Application Optimization Guide Copyright 2011, 2013 ARM. All rights reserved. ARM DUI 0555B () ARM Mali GPU OpenGL ES Application Optimization Guide Copyright 2011,
More informationOverview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group
Nema An OpenGL & OpenCL Embedded Programmable Engine Georgios Keramidas & Iakovos Stamoulis Think Silicon mobile GRAPHICS Overview Think Silicon is a privately held company founded in 2007 by the core
More informationGRAPHICS PROCESSING UNITS
GRAPHICS PROCESSING UNITS Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 4, John L. Hennessy and David A. Patterson, Morgan Kaufmann, 2011
More informationLecture 15: Introduction to GPU programming. Lecture 15: Introduction to GPU programming p. 1
Lecture 15: Introduction to GPU programming Lecture 15: Introduction to GPU programming p. 1 Overview Hardware features of GPGPU Principles of GPU programming A good reference: David B. Kirk and Wen-mei
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationReal - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský
Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation
More informationIntroduction to Multicore architecture. Tao Zhang Oct. 21, 2010
Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)
More informationGraphics Hardware. Graphics Processing Unit (GPU) is a Subsidiary hardware. With massively multi-threaded many-core. Dedicated to 2D and 3D graphics
Why GPU? Chapter 1 Graphics Hardware Graphics Processing Unit (GPU) is a Subsidiary hardware With massively multi-threaded many-core Dedicated to 2D and 3D graphics Special purpose low functionality, high
More informationA Reconfigurable Architecture for Load-Balanced Rendering
A Reconfigurable Architecture for Load-Balanced Rendering Jiawen Chen Michael I. Gordon William Thies Matthias Zwicker Kari Pulli Frédo Durand Graphics Hardware July 31, 2005, Los Angeles, CA The Load
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationRendering Structures Analyzing modern rendering on mobile
Rendering Structures Analyzing modern rendering on mobile 2018 Arm Limited Hans-Kristian Arntzen 2018-08-16 SIGGRAPH 2018 Content 1 2 3 4 5 Motivation Scene and lights Rendering structures overview Benchmark
More informationLecture 25: Board Notes: Threads and GPUs
Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel
More informationShaders. Slide credit to Prof. Zwicker
Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?
More informationPOWERVR MBX & SGX OpenVG Support and Resources
POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,
More information