Overview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group

Size: px
Start display at page:

Download "Overview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group"

Transcription

1 Nema An OpenGL & OpenCL Embedded Programmable Engine Georgios Keramidas & Iakovos Stamoulis Think Silicon mobile GRAPHICS

2 Overview Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group Intellectual Property semiconductor cores in the field of computer graphics for mobile/embedded devices.

3 Mobile Graphics: Past- Present-Future t past Visualize data from various sources (Internet of Things paradigm) present future

4 Moore s Law in mobile GPUs SCALE LOG GPU performance GPU Power Display Resolution 10 years ago 5 years ago NOW in 5 years in 10 years Applications in next 5 years will need 3-4x the current GPU perf. Display resolution exponential increase (4K displays are here) BUT Power roughly under the same power budget (few hundreds milliwatts)

5 Application Processor now Multicore CPU Multicore GPU fully programmable Video Encoder Video Decoder Image Processors Camera Processors ASIC/ASIP (semi) programmable Vision Processors Market needs SMARTER mobile GPUs

6 Application Processor future Multicore CPU Multicore GPU fully programmable Video Encoder Video Decoder Image Processors Camera Processors Vision Processors Multicore Multipurpose Multidomain GPUs Mobile GPUs Media Processors & GPGPUs

7 Mobile GPU challenges Performance Power Consumption Low Die Area Latency Wall Memory Wall Frequency Wall Adaptability to new Standards Time to Market Performance per mm 2 /Watt/$

8 Think3D Processing Array Local Caches Shared Cache/MMU Nema Cluster 4 cores/cluster Router Multicore Modular Design NoC based Scalable Design SoCAXI fabric (internal signalling) Disruptive Architecture

9 Nema Graphics Cluster 4-cores DMA riangle Setup T g Tr riangle Cullin Rasterizer Early-Z nterpolators I Nema Cores Stencil/Blender Raster Operations Z-Buffer SIMD Multipoint Texture Mapping I$ D$ AXI AXI AXI Aggregator / MMU Unified Shading Architecture: Vertex, Fragment, GPGPU tasks GPU front-end: fixed hardware GPU back-end: programmable engine GPU Texture Unit: Multipoint, mipmap and texture compression support, separate or system cache

10 Nema Graphics Cluster 4-cores DMA riangle Setup T g Tr riangle Cullin Rasterizer nterpolators I Early-Z nterpolators I Nema Cores Stencil/Blender Raster Operations Z-Buffer SIMD 2D Composition Engine Entropy Encoder Multipoint Texture Mapping I$ D$ AXI AXI AXI Aggregator / MMU 1. Hardware entropy encoder for video processing 2. Backend programmable engine Image Signal Processing (ISP) 3. 2D/Composition Engine (Think2D) Turns off FP cores in simple GPU tasks

11 Nema ID Hybrid SIMD/MIMD machine Hardware/Software scheduler balances SIMD/MIMD threads in the cluster In-house ISA 32bit 3 b t Integer registers, s, 128bit Vector register, (4x32 IEEE single, 8x16 Half floats) Ultra-threaded th d Single or VLIW-like Dual Issue Core concurrent threads per core High utilization of EX resources

12 Nema ID Configurable number of pipeline stages High MHz frequencies (up to 16 stages) Each thread occupies a different pipe stage Hibernated threads Ready to run threads send for execution No hardware Feedbacks/Hazards/Interlocks less complexity smaller cores low power cores more cores Non Blocking Hide Latency at all levels No Stalling on a miss or long latency arithmetic operations other threads enter the pipeline

13 Low Power Features Power is a design parameter at all design stages Power Emulator estimates power consumption (GPGPUSim-Pow Pow from LPGPU) Core-level Cores can be turned on and off, Hardware Scheduler will not use the off cores Core-level DVFS Memory-level Sophisticated way selection techniques to reduce memory cell accesses Compression minimizes external access Framebuffer compression Z-buffer compression Various texture compression algorithms Separate 2D/Composition Engine Unit (Think2D) Keeps Every-hungry floating point Processors turned-off

14 GPU Special Features Fully Programmable GLSL/OpenGL ES On-the-fly Compression Z-buffer (patent in progress) Framebuffer (65% less framebuffer bandwidth) Hardware Value memoization mechanism reduces computation ti effort 30% (avg) less executed instruction Different number of threads per core (heterogeneous system) Up to 50% power reduction via run-time load balancing+dvfs in a 8-core CMP

15 GPGPU Special Features Fully Programmable C/C++/OpenMP/OpenCL Faster Divergent Thread performance (inherit by design) Up to 30% in complex conditional programs Faster Single Thread Performance Compiler driven interlocks & prefetching Up to 100% increase in speed (valuable in GPGPU tasks) Compiler detects dependencies and embeds scheduling info in instructions Maintain pipeline simplicity without hazard detection Easily extendable ISA for new generations apps & standards Currently: instructions for accelerating video processing DCT/iDCT DCT/iDCT SAD (Sum of Absolute Differences) Color Space Conversion on various formats

16 Think3D Ecosystem Assembler based on BinUtils 2.22 (in-house) Compiler based on LLVM-3.2 (in-house) Clang front-end for C/C++/OpenCL In-house GLSL to LLVM IR Polyhedral Optimizations (Polly) Link Time Optimizations (LTO) Libraries & APIs OpenCL API (based on pocl) OpenGL ESAPI (based on Mesa3D) Ported Pthreads-like library allows multithreading to be used immediately Ported libc/ libm / libcl

17 Shorter Time-to-Market Time-to-market market designs is the target Detailed Processor Description and exact Specification of each ISA instruction Hardware Registers, Pipeline Stages, system registers ISA Specification manual automated Documentation LLVM Compiler BinUtils Assembler Linker System C Emulator Verilog RTL Regression Suite Pass/Fail

18 Diverse applications supported Design one, use many Scalability, adaptability Threads per Core Estimate Power Cores per Cluster NoC-based for high number of Cores (8+) Change Pipe stages Modify instruction set

19 Status Today today today 14Q1 14Q1 first release First Release 14Q3 multicluster l t Nema Processor Passes regression tests Benchmarking ISA choices Power Management Algorithms 4 Core/1 cluster Nema multi cluster NoC Based Graphics Modules Rasterizer Interpolators Texture Mapping Memory subsystem Blending Unit MMU API supported OpenGL ES 2.0 OpenCL Testing FPGA prototype OpenGL ES 2.0 OpenCL System-level Optimizations H264/5 Image Processing OpenVX

20 Contact Info HQ Patras Science Park NL High Tech Campus 1-E Sale Inquiries : Rion Achaias 5656 AE Eindhoven sales@think-silicon.com Greece The Netherlands info : Tel: Fax: Tel: + 31 (0) info@think-silicon.com

Ultra Low Power GPUs for Wearables

Ultra Low Power GPUs for Wearables Ultra Low Power GPUs for Wearables Georgios Keramidas January 2015 The Company Who we are? Think Silicon is a privately held company founded in 2007. What we do? Development of low power GPU IP semiconductor

More information

Overview. Technology Details. D/AVE NX Preliminary Product Brief

Overview. Technology Details. D/AVE NX Preliminary Product Brief Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics

More information

CS427 Multicore Architecture and Parallel Computing

CS427 Multicore Architecture and Parallel Computing CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:

More information

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010

Introduction to Multicore architecture. Tao Zhang Oct. 21, 2010 Introduction to Multicore architecture Tao Zhang Oct. 21, 2010 Overview Part1: General multicore architecture Part2: GPU architecture Part1: General Multicore architecture Uniprocessor Performance (ECint)

More information

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson

Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling

More information

Multimedia in Mobile Phones. Architectures and Trends Lund

Multimedia in Mobile Phones. Architectures and Trends Lund Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

Working with Metal Overview

Working with Metal Overview Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager

Enabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller

CSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 6. Parallel Processors from Client to Cloud COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 6 Parallel Processors from Client to Cloud Introduction Goal: connecting multiple computers to get higher performance

More information

PowerVR Hardware. Architecture Overview for Developers

PowerVR Hardware. Architecture Overview for Developers Public Imagination Technologies PowerVR Hardware Public. This publication contains proprietary information which is subject to change without notice and is supplied 'as is' without warranty of any kind.

More information

Windowing System on a 3D Pipeline. February 2005

Windowing System on a 3D Pipeline. February 2005 Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April

More information

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research

Optimizing Games for ATI s IMAGEON Aaftab Munshi. 3D Architect ATI Research Optimizing Games for ATI s IMAGEON 2300 Aaftab Munshi 3D Architect ATI Research A A 3D hardware solution enables publishers to extend brands to mobile devices while remaining close to original vision of

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore

CSCI-GA Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore CSCI-GA.3033-012 Multicore Processors: Architecture & Programming Lecture 10: Heterogeneous Multicore Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Status Quo Previously, CPU vendors

More information

Shaders. Slide credit to Prof. Zwicker

Shaders. Slide credit to Prof. Zwicker Shaders Slide credit to Prof. Zwicker 2 Today Shader programming 3 Complete model Blinn model with several light sources i diffuse specular ambient How is this implemented on the graphics processor (GPU)?

More information

GeForce4. John Montrym Henry Moreton

GeForce4. John Montrym Henry Moreton GeForce4 John Montrym Henry Moreton 1 Architectural Drivers Programmability Parallelism Memory bandwidth 2 Recent History: GeForce 1&2 First integrated geometry engine & 4 pixels/clk Fixed-function transform,

More information

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008

Challenges for GPU Architecture. Michael Doggett Graphics Architecture Group April 2, 2008 Michael Doggett Graphics Architecture Group April 2, 2008 Graphics Processing Unit Architecture CPUs vsgpus AMD s ATI RADEON 2900 Programming Brook+, CAL, ShaderAnalyzer Architecture Challenges Accelerated

More information

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015

PowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015 PowerVR GPU IP from Wearables to Servers Kristof Beets Director of Business Development May 2015 www.imgtec.com Expanding embedded GPU market opportunities Huge range of market opportunities equates to

More information

3D Graphics in Future Mobile Devices. Steve Steele, ARM

3D Graphics in Future Mobile Devices. Steve Steele, ARM 3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and

More information

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM

Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases. Steve Steele, ARM Unleashing the benefits of GPU Computing with ARM Mali TM Practical applications and use-cases Steve Steele, ARM 1 Today s Computational Challenges Trends Growing display sizes and resolutions, richer

More information

The Benefits of GPU Compute on ARM Mali GPUs

The Benefits of GPU Compute on ARM Mali GPUs The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >

More information

Portland State University ECE 588/688. Graphics Processors

Portland State University ECE 588/688. Graphics Processors Portland State University ECE 588/688 Graphics Processors Copyright by Alaa Alameldeen 2018 Why Graphics Processors? Graphics programs have different characteristics from general purpose programs Highly

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright

More information

! Readings! ! Room-level, on-chip! vs.!

! Readings! ! Room-level, on-chip! vs.! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 7 especially 7.1-7.8!! (Over next 2 weeks)!! Introduction to Parallel Computing!! https://computing.llnl.gov/tutorials/parallel_comp/!! POSIX Threads

More information

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal

Graphics Hardware, Graphics APIs, and Computation on GPUs. Mark Segal Graphics Hardware, Graphics APIs, and Computation on GPUs Mark Segal Overview Graphics Pipeline Graphics Hardware Graphics APIs ATI s low-level interface for computation on GPUs 2 Graphics Hardware High

More information

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1

X. GPU Programming. Jacobs University Visualization and Computer Graphics Lab : Advanced Graphics - Chapter X 1 X. GPU Programming 320491: Advanced Graphics - Chapter X 1 X.1 GPU Architecture 320491: Advanced Graphics - Chapter X 2 GPU Graphics Processing Unit Parallelized SIMD Architecture 112 processing cores

More information

GPUs and GPGPUs. Greg Blanton John T. Lubia

GPUs and GPGPUs. Greg Blanton John T. Lubia GPUs and GPGPUs Greg Blanton John T. Lubia PROCESSOR ARCHITECTURAL ROADMAP Design CPU Optimized for sequential performance ILP increasingly difficult to extract from instruction stream Control hardware

More information

Parallel Programming on Larrabee. Tim Foley Intel Corp

Parallel Programming on Larrabee. Tim Foley Intel Corp Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This

More information

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University

CSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand

More information

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm

Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel CPUs are becoming more parallel Dual and quad cores, roadm XMT-GPU A PRAM Architecture for Graphics Computation Tom DuBois, Bryant Lee, Yi Wang, Marc Olano and Uzi Vishkin Bifurcation Between CPU and GPU CPUs General purpose, serial GPUs Special purpose, parallel

More information

From Application to Technology OpenCL Application Processors Chung-Ho Chen

From Application to Technology OpenCL Application Processors Chung-Ho Chen From Application to Technology OpenCL Application Processors Chung-Ho Chen Computer Architecture and System Laboratory (CASLab) Department of Electrical Engineering and Institute of Computer and Communication

More information

45-year CPU Evolution: 1 Law -2 Equations

45-year CPU Evolution: 1 Law -2 Equations 4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there

More information

Graphics Hardware. Instructor Stephen J. Guy

Graphics Hardware. Instructor Stephen J. Guy Instructor Stephen J. Guy Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability! Programming Examples Overview What is a GPU Evolution of GPU GPU Design Modern Features Programmability!

More information

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive

More information

Copyright Khronos Group Page 1. Vulkan Overview. June 2015

Copyright Khronos Group Page 1. Vulkan Overview. June 2015 Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration

More information

Mali Developer Resources. Kevin Ho ARM Taiwan FAE

Mali Developer Resources. Kevin Ho ARM Taiwan FAE Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9

General Purpose GPU Programming. Advanced Operating Systems Tutorial 9 General Purpose GPU Programming Advanced Operating Systems Tutorial 9 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský

Real - Time Rendering. Graphics pipeline. Michal Červeňanský Juraj Starinský Real - Time Rendering Graphics pipeline Michal Červeňanský Juraj Starinský Overview History of Graphics HW Rendering pipeline Shaders Debugging 2 History of Graphics HW First generation Second generation

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett

Spring 2010 Prof. Hyesoon Kim. AMD presentations from Richard Huddy and Michael Doggett Spring 2010 Prof. Hyesoon Kim AMD presentations from Richard Huddy and Michael Doggett Radeon 2900 2600 2400 Stream Processors 320 120 40 SIMDs 4 3 2 Pipelines 16 8 4 Texture Units 16 8 4 Render Backens

More information

SIGGRAPH Briefing August 2014

SIGGRAPH Briefing August 2014 Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances

More information

GPGPU introduction and network applications. PacketShaders, SSLShader

GPGPU introduction and network applications. PacketShaders, SSLShader GPGPU introduction and network applications PacketShaders, SSLShader Agenda GPGPU Introduction Computer graphics background GPGPUs past, present and future PacketShader A GPU-Accelerated Software Router

More information

Cymric A Framework for Prototyping Near-Memory Architectures

Cymric A Framework for Prototyping Near-Memory Architectures A Framework for Prototyping Near-Memory Architectures Chad D. Kersey 1, Hyesoon Kim 2, Sudhakar Yalamanchili 1 The rest of the team: Nathan Braswell, Jemmy Gazhenko, Prasun Gera, Meghana Gupta, Hyojong

More information

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015

Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Copyright Khronos Group 2015 - Page 1 Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

More information

Developing the Bifrost GPU architecture for mainstream graphics

Developing the Bifrost GPU architecture for mainstream graphics Developing the Bifrost GPU architecture for mainstream graphics Anand Patel Senior Product Manager, Media Processing Group ARM Tech Symposia India December 7 th 2016 Graphics processing drivers Virtual

More information

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV

0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV 0;L$+LJK3HUIRUPDQFH ;3URFHVVRU:LWK,QWHJUDWHG'*UDSKLFV Rajeev Jayavant Cyrix Corporation A National Semiconductor Company 8/18/98 1 0;L$UFKLWHFWXUDO)HDWXUHV ¾ Next-generation Cayenne Core Dual-issue pipelined

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA Contents LLVM The nu+ architecture and toolchain LLVM 3 What is LLVM? LLVM is a compiler infrastructure designed as a set of reusable libraries with well

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors:!

More information

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL

Grafica Computazionale: Lezione 30. Grafica Computazionale. Hiding complexity... ;) Introduction to OpenGL. lezione30 Introduction to OpenGL Grafica Computazionale: Lezione 30 Grafica Computazionale lezione30 Introduction to OpenGL Informatica e Automazione, "Roma Tre" May 20, 2010 OpenGL Shading Language Introduction to OpenGL OpenGL (Open

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Next Generation Visual Computing

Next Generation Visual Computing Next Generation Visual Computing (Making GPU Computing a Reality with Mali ) Taipei, 18 June 2013 Roberto Mijat ARM Addressing Computational Challenges Trends Growing display sizes and resolutions Increasing

More information

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology

CS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367

More information

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield

NVIDIA GTX200: TeraFLOPS Visual Computing. August 26, 2008 John Tynefield NVIDIA GTX200: TeraFLOPS Visual Computing August 26, 2008 John Tynefield 2 Outline Execution Model Architecture Demo 3 Execution Model 4 Software Architecture Applications DX10 OpenGL OpenCL CUDA C Host

More information

Copyright Khronos Group Page 1

Copyright Khronos Group Page 1 Open Standards and Open Source Together How Khronos APIs Accelerate Fast and Cool Applications Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page

More information

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others

What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. * slides thanks to Kavita Bala & many others What Next? Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University * slides thanks to Kavita Bala & many others Final Project Demo Sign-Up: Will be posted outside my office after lecture today.

More information

Understanding Shaders and WebGL. Chris Dalton & Olli Etuaho

Understanding Shaders and WebGL. Chris Dalton & Olli Etuaho Understanding Shaders and WebGL Chris Dalton & Olli Etuaho Agenda Introduction: Accessible shader development with WebGL Understanding WebGL shader execution: from JS to GPU Common shader bugs Accessible

More information

GPU Memory Model Overview

GPU Memory Model Overview GPU Memory Model Overview John Owens University of California, Davis Department of Electrical and Computer Engineering Institute for Data Analysis and Visualization SciDAC Institute for Ultrascale Visualization

More information

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer

Real-Time Rendering (Echtzeitgraphik) Michael Wimmer Real-Time Rendering (Echtzeitgraphik) Michael Wimmer wimmer@cg.tuwien.ac.at Walking down the graphics pipeline Application Geometry Rasterizer What for? Understanding the rendering pipeline is the key

More information

CONSOLE ARCHITECTURE

CONSOLE ARCHITECTURE CONSOLE ARCHITECTURE Introduction Part 1 What is a console? Console components Differences between consoles and PCs Benefits of console development The development environment Console game design What

More information

Graph Streaming Processor

Graph Streaming Processor Graph Streaming Processor A Next-Generation Computing Architecture Val G. Cook Chief Software Architect Satyaki Koneru Chief Technology Officer Ke Yin Chief Scientist Dinakar Munagala Chief Executive Officer

More information

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design

More information

Lecture 2. Shaders, GLSL and GPGPU

Lecture 2. Shaders, GLSL and GPGPU Lecture 2 Shaders, GLSL and GPGPU Is it interesting to do GPU computing with graphics APIs today? Lecture overview Why care about shaders for computing? Shaders for graphics GLSL Computing with shaders

More information

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with

More information

Introduction to GPU computing

Introduction to GPU computing Introduction to GPU computing Nagasaki Advanced Computing Center Nagasaki, Japan The GPU evolution The Graphic Processing Unit (GPU) is a processor that was specialized for processing graphics. The GPU

More information

Taipei Embedded Outreach OpenCL DSP Profile Proposals

Taipei Embedded Outreach OpenCL DSP Profile Proposals Copyright 2018 The Khronos Group Inc. Page 1 Taipei Embedded Outreach OpenCL DSP Profile Proposals Prof. Jenq-Kuen Lee, NTHU Taipei, January 2018 Copyright 2018 The Khronos Group Inc. Page 2 Outline Speaker

More information

Technology and Design Tools for Multicore Embedded Systems Software Development

Technology and Design Tools for Multicore Embedded Systems Software Development Technology and Design Tools for Multicore Embedded Systems Software Development Yuriy Sheynin, Alexey Syschikov, Boris Sedov Saint Petersburg State University of Aerospace Instrumentation Why do we need

More information

Threading Hardware in G80

Threading Hardware in G80 ing Hardware in G80 1 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA 2 3D 3D API: API: OpenGL OpenGL or or Direct3D Direct3D GPU Command &

More information

Administrivia. Minute Essay From 4/11

Administrivia. Minute Essay From 4/11 Administrivia All homeworks graded. If you missed one, I m willing to accept it for partial credit (provided of course that you haven t looked at a sample solution!) through next Wednesday. I will grade

More information

Computer Architecture 计算机体系结构. Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 10. Data-Level Parallelism and GPGPU 第十讲 数据级并行化与 GPGPU Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2017 Review Thread, Multithreading, SMT CMP and multicore Benefits of

More information

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem

Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page

More information

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13

Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,

More information

Dave Shreiner, ARM March 2009

Dave Shreiner, ARM March 2009 4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline

More information

Graphics Processing Unit Architecture (GPU Arch)

Graphics Processing Unit Architecture (GPU Arch) Graphics Processing Unit Architecture (GPU Arch) With a focus on NVIDIA GeForce 6800 GPU 1 What is a GPU From Wikipedia : A specialized processor efficient at manipulating and displaying computer graphics

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair

Mobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair OpenGL ES in the Mobile Graphics Ecosystem Tom Olson OpenGL ES working group chair Director, Graphics Research, ARM Ltd 1 Outline Why Mobile Graphics? OpenGL ES Overview Getting Started with OpenGL ES

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

Current Trends in Computer Graphics Hardware

Current Trends in Computer Graphics Hardware Current Trends in Computer Graphics Hardware Dirk Reiners University of Louisiana Lafayette, LA Quick Introduction Assistant Professor in Computer Science at University of Louisiana, Lafayette (since 2006)

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Chapter 18 - Multicore Computers

Chapter 18 - Multicore Computers Chapter 18 - Multicore Computers Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ Luis Tarrataca Chapter 18 - Multicore Computers 1 / 28 Table of Contents I 1 2 Where to focus your study Luis Tarrataca

More information

Regression Modelling of Power Consumption for Heterogeneous Processors. Tahir Diop

Regression Modelling of Power Consumption for Heterogeneous Processors. Tahir Diop Regression Modelling of Power Consumption for Heterogeneous Processors by Tahir Diop A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department

More information

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7

General Purpose GPU Programming. Advanced Operating Systems Tutorial 7 General Purpose GPU Programming Advanced Operating Systems Tutorial 7 Tutorial Outline Review of lectured material Key points Discussion OpenCL Future directions 2 Review of Lectured Material Heterogeneous

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

CPU-GPU Heterogeneous Computing

CPU-GPU Heterogeneous Computing CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Mobile AR Hardware Futures

Mobile AR Hardware Futures Copyright Khronos Group, 2010 - Page 1 Mobile AR Hardware Futures Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Two Perspectives NVIDIA - Tegra 2 mobile processor Khronos

More information

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools

Mobile Performance Tools and GPU Performance Tuning. Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools Mobile Performance Tools and GPU Performance Tuning Lars M. Bishop, NVIDIA Handheld DevTech Jason Allen, NVIDIA Handheld DevTools NVIDIA GoForce5500 Overview World-class 3D HW Geometry pipeline 16/32bpp

More information

GPGPU on Mobile Devices

GPGPU on Mobile Devices GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information