Silicon Acceleration APIs
|
|
- Cory Johnston
- 5 years ago
- Views:
Transcription
1 Copyright Khronos Group Page 1 Silicon Acceleration APIs Embedded Technology 2016, Yokohama Neil Trevett Vice President Developer Ecosystem, NVIDIA President, Khronos November 2016
2 Copyright Khronos Group Page 2 Khronos and Open Standards Software Silicon Khronos is an Industry Consortium of over 100 companies We create royalty-free, open standard APIs for hardware acceleration of Graphics, Parallel Compute, Neural Networks and Vision
3 Copyright Khronos Group Page 3 Accelerated API Landscape Vision Frameworks Neural Net Libraries OpenVX Neural Net Extension High-level Language-based Acceleration Frameworks 3D Graphics Explicit Kernels GPU FPGA DSP Dedicated Hardware
4 Copyright Khronos Group Page 4 OpenCL Low-level Parallel Programing Low level programming of heterogeneous parallel compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA OpenCL C language to write kernel programs to execute on any compute device - Platform Layer API - to query, select and initialize compute devices - Runtime API - to build and execute kernels programs on multiple devices New in OpenCL OpenCL C++ kernel language - a static subset of C Adaptable and elegant sharable code great for building libraries - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism OpenCL Kernel OpenCL Code Kernel OpenCL Code Kernel OpenCL Code Kernel Code Kernel code compiled for devices GPU DSP FPGA CPU CPU Devices Runtime API loads and executes kernels across devices CPU Host
5 OpenCL Conformant Implementations 1.0 May Jul Jun Aug Aug May Dec May Feb11 Desktop 1.1 Mar Dec Jul Jun May Jun May Aug Mar Feb Sep13 Mobile 1.1 Nov Apr Nov Apr Dec Jan May Jul Sep14 FPGA 1.0 Dec14 Embedded 1.2 May Aug15 Vendor timelines are first implementation of each spec generation Dec08 OpenCL 1.0 Specification Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification Nov13 OpenCL 2.0 Specification Nov15 OpenCL 2.1 Specification Copyright Khronos Group Page 5
6 Copyright Khronos Group Page 6 SYCL for OpenCL Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code Aligns the hardware acceleration of OpenCL with direction of the C++ standard - C++14 with open source C++17 Parallel STL hosted by Khronos Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches C++ Kernel Language Low Level Control GPGPU -style separation of device-side kernel source code and host code Single-source C++ Programmer Familiarity Approach also taken by C++ AMP and OpenMP
7 Copyright Khronos Group Page 7 Embedded Vision Acceleration Embedded Technology, Yokohama Shorin Kyo, Huawei
8 Power Efficiency Copyright Khronos Group Page 8 OpenVX Low Power Vision Acceleration Targeted at vision acceleration in real-time, mobile and embedded platforms - Precisely defined API for production deployment Higher abstraction than OpenCL for performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware Extends portable vision acceleration to very low power domains - Doesn t require high-power CPU/GPU Complex or OpenCL precision X100 X10 X1 Dedicated Hardware Vision DSPs Vision Processing Efficiency GPU Compute Multi-core CPU Computation Flexibility Vision Engine Middleware Application GPU DSP Hardware
9 Copyright Khronos Group Page 9 OpenVX Graphs OpenVX developers express a graph of image operations ( Nodes ) - Nodes can be on any hardware or processor coded in any language Graphs can execute almost autonomously - Possible to Minimize host interaction during frame-rate graph execution Graphs are the key to run-time optimization opportunities Camera Input OpenVX Nodes Pyr t Rendering Output RGB Frame Color Conversion YUV Frame Channel Extract Gray Frame Image Pyramid Optical Flow Array of Keypoints Harris Track OpenVX Graph Feature Extraction Example Graph Ftr t-1 Array of Features
10 Copyright Khronos Group Page 10 OpenVX Efficiency through Graphs.. Graph Scheduling Memory Management Kernel Merge Data Tiling Split the graph execution across the whole system: CPU / GPU / dedicated HW Reuse pre-allocated memory for multiple intermediate data Replace a subgraph with a single faster node Execute a subgraph at tile granularity instead of image granularity Faster execution or lower power consumption Less allocation overhead, more memory for other applications Better memory locality, less kernel launch overhead Better use of data cache and local memory
11 Copyright Khronos Group Page 11 Example Relative Performance 10 9 Relative Performance OpenCV (GPU accelerated) OpenVX (GPU accelerated) 2.5 NVIDIA implementation experience. Geometric mean of >2200 primitives, grouped into each categories, running at different image sizes and parameter settings Arithmetic Analysis Filter Geometric Overall
12 Copyright Khronos Group Page 12 Layered Vision Processing Ecosystem Implementers may use OpenCL or Compute Shaders to implement OpenVX nodes on programmable processors And then developers can use OpenVX to enable a developer to easily connect those nodes into a graph Programmable Vision Processors Application C/C++ Dedicated Vision Hardware AMD OpenVX - Open source, highly optimized for x86 CPU and OpenCL for GPU - Graph Optimizer looks at entire processing pipeline and removes/replaces/merges functions to improve performance and bandwidth - Scripting for rapid prototyping, without re-compiling, at production performance levels OpenVX enables the graph to be extended to include hardware architectures that don t support programmable APIs The OpenVX graph enables implementers to optimize execution across diverse hardware architectures an drive to lower power implementations
13 Copyright Khronos Group Page 13 OpenVX 1.0 Shipping, OpenVX 1.1 Released! Multiple OpenVX 1.0 Implementations shipping spec in October Open source sample implementation and conformance tests available OpenVX 1.1 Specification released 2nd May Expands node functionality AND enhances graph framework OpenVX is EXTENSIBLE - Implementers can add their own nodes at any time to meet customer and market needs = shipping implementations
14 Copyright Khronos Group Page 14 OpenVX Neural Net Extension Convolution Neural Network topologies can be represented as OpenVX graphs - Layers are represented as OpenVX nodes - Layers connected by multi-dimensional tensors objects - Layer types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes Import/Export Extension - Efficient handling of network Weights/Biases or complete networks The specification is provisional - Welcome feedback from the deep learning community Native Camera Control Vision Node Vision Node CNN Nodes Vision Node Downstream Application Processing An OpenVX graph mixing CNN nodes with traditional vision nodes
15 Copyright Khronos Group Page 15 NNEF - Neural Network Exchange Format NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Neural Net Exchange Format Inference Engine 1 Inference Engine 2 Inference Engine 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 NNEF encapsulates neural network structure, data formats, commonly used operations (such as convolution, pooling, normalization, etc.) and formal network semantics NNEF 1.0 is currently being defined OpenVX will import NNEF files
16 OpenGL ES Fixed function Pipeline Vertex and fragment shaders 32-bit integers and floats NPOT, 3D/depth textures Texture arrays Multiple Render Targets Compute Shaders Tessellation and geometry shaders ASTC Texture Compression Floating point render targets Debug and robustness for security Epic s Rivalry demo using full Unreal Engine ES 1.0 Driver Update 2004 ES 1.1 Silicon Update 2007 ES 2.0 Close to 2 Billion OpenGL ES devices shipped in 2015 Silicon Update 2012 ES 3.0 Driver Update 2014 ES 3.1 L Silicon Update AEP 2015 ES 3.2 Driver Update 2016 N OpenGL ES 2.0 OpenGL ES 3.x Vertex and Fragment Shaders Compute Shaders Copyright Khronos Group Page 16
17 Copyright Khronos Group Page 17 The Key Principle of Vulkan: Explicit Control Application tells the driver what it is going to do - In enough detail that driver doesn t have to guess - When the driver needs to know it You have to supply a LOT of information! And, you have to supply it early In return, driver promises to do - What the application asks for - When it asks for it - Very quickly Efficient, predictable performance No driver magic no surprises
18 Copyright Khronos Group Page 18 Vulkan Explicit GPU Control Application Single thread per context High-level Driver Abstraction Context management Memory allocation Full GLSL compiler Error detection Layered GPU Control GPU Application Memory allocation Thread management Synchronization Multi-threaded generation of command buffers Thin Driver Explicit GPU Control GPU Vulkan 1.0 provides access to OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality but with increased performance and flexibility Language Front-end Compilers Initially GLSL SPIR-V pre-compiled shaders Loadable debug and validation layers Vulkan Benefits Resource management in app code: Less driver hitches and surprises Simpler drivers: Improved efficiency/performance Reduced CPU bottlenecks Lower latency Increased portability Multi-threaded Command Buffers: Command creation can be multi-threaded Multiple CPU cores increase performance Graphics, compute and DMA queues: Work dispatch flexibility SPIR-V Pre-compiled Shaders: No front-end compiler in driver Future shading language flexibility Loadable Layers No error handling overhead in production code
19 Copyright Khronos Group Page 19 Vulkan Multi-threading Efficiency CPU Thread Command Buffer 1. Multiple threads can construct Command Buffers in parallel Application is responsible for thread management and synch Command Buffer CPU Thread CPU Thread CPU Thread Command Buffer Command Queue GPU Command Buffer CPU Thread 2. Command Buffers placed in Command Queue by separate submission thread CPU Thread Command Buffer Applications can create graphics, compute and DMA command buffers with a general queue model that can be extended to more heterogeneous processing in the future
20 Khronos Safety Critical APIs DO-178B/C ISO OpenGL SC Fixed function graphics subset Experience and Guidelines OpenGL SC April 2016 Shader programmable pipeline subset New Generation APIs for safety certifiable vision, graphics and compute OpenGL ES Fixed function graphics OpenGL ES Shader programmable pipeline Safety Critical vision processing Safety Critical heterogeneous compute Small driver size Advanced functionality Graphics and compute Copyright Khronos Group Page 20
21 Copyright Khronos Group Page 21 Please Consider Joining Khronos! Understand early industry requirements! Influence how standards evolve! Ship products that conform to international standards! Access draft specs to build products faster! Khronos is proven to RAPIDLY generate hardware API standards that create significant market opportunities Any company or organization is welcome to join Khronos for a voice and a vote in any of its standards
22 Copyright Khronos Group Page 22 VeriSilicon Computer Vision solution with OpenVX over VIP Gaku Ogura Simon Jones 18 November 2016
23 Copyright Khronos Group Page 23 VeriSilicon Who We Are Founded in 2001 Over 650 employees 70% dedicated to R&D
24 Copyright Khronos Group Page 24 VeriSilicon What we can do? Idea to Spec Spec to RTL IP Development & Licensing System Architecture Silicon Design Shipping Test RTL to Netlist Package Netlist to GDS Manufacture
25 Movement of Intelligent Devices Multi-Media Graphics, Video, Audio, Voice Natural User Interface - VISION - Natural Language Processing - Sensors Many applications need flexibility 1000s of algorithms new one every day Compute intensive need hardware acceleration Algorithms keep changing need programmability Copyright Khronos Group Page 25
26 Copyright Khronos Group Page 26 Deep Learning: Neural Networks Return with Vengeance With big data and high density compute, deep neural networks can be trained to solve impossible computer vision problems % Teams using Deep Learning
27 Copyright Khronos Group Page 27 The Basics of Deep Neural Networks (DNN) Training: MACs/dataset Offline (mostly) Analogous to compiling forward Pug =? Rottweiler Inference: MACs/image On-the-fly Analogous to running s/w Embedded systems use this backward 100K-100M Network Coefficients forward error Rottweiler
28 Challenges of Real-Time Inference Classification Detection Segmentation 100G MAC/s 1T MAC/s 10T MAC/s Majority of deep neural net computation is Multiply-Accumulate (MAC) for convolutions and inner products Memory bandwidth is a bigger challenge - Convolution in deep neural nets works with not just 2D images, but high-dimensional arrays (i.e. tensor ) - Example: AlexNet (classification) has 60M parameters (240MB FP32), DDR BW with brute force implementation: 35GB/s Copyright Khronos Group Page 28
29 Copyright Khronos Group Page 29 Processor Architectures Programmability CPU OpenCL OpenVX OpenCV VIP VeriSilicon GPU VeriSilicon ARM Imagination Custom RTL DSP VeriSilicon CEVA Videantis Cadence Synopsys Energy
30 Copyright Khronos Group Page 30 Efficient Processor Architecture to Enable DNN for Embedded Vision Programmable engine to cope with new CV algorithms Future new NN layers can be implemented here Scalable architecture to support different PPA (Performance, Power, Area) requirements High utilization MACs for DNN Scalable architecture to support different PPA (Performance, Power, Area) requirements Handles other DNN functions (normalization, pooling, ) Handles pruning, compression, batching to increase MAC utilization and decrease DDR BW Vision Framework API Programmable Programmable Engine Engine Engine Vision Applications Scalable Scalable MACs MACs Intelligent Data Fabrics Local Memory / Cache Deep Learning Frameworks Mature CV Block HW Accelerator Mature CV Block HW Accelerator Customer Defined HW Accelerator Custom RTL to accelerate mature CV algorithms for 24/7 low power operation I/F to extend HW acceleration for application specific purpose For data synchronization between threads/blocks and minimize DDR BW
31 Copyright Khronos Group Page 31 Use Case Study: Faster RCNN One of state-of-the-art object detection CNNs - Network configuration region proposals - 60M parameters, 30G MACs/image Collaborative computing in VIP - NN Engine: number crunching - Convolution layers, full connected layers - Tensor Processing Fabric: data rearrangement - LRN, max pooling, ROI pooling - Programmable Engine: precision compute - RPN, NMS, softmax Programmabl e Engine NN Cores TP Fabric Local Memory / Cache Output of a layer is queued in DDR if next layer is on different processing engine Real-time performance (30fps HD) under 28nm HPM
32 Copyright Khronos Group Page 32 Scalable Performance from VIP8000 VIP8000 Series VIP Nano VIP8000UL VIP8000L VIP8000 Number of Execution cores Clock Frequency (MHz) HD Performance@800MHz Perspective Warping 60 fps 120 fps 240 fps 480 fps Optical Flow LK 30 fps 60 fps 120 fps 240 fps Pedestrian Detection 30 fps 60 fps 120 fps 240 fps Convolutional Neural Network (AlexNet) 125 fps 250 fps 500 fps 1000 fps Scalable performance with SAME application code on different processor variants
33 Comprehensive Multi-Media/Pixel Subsystems CPU VPUCore (Video) GC Core (GPU) VIP Core (Vision/Image) ZSP (Audio/Voice) ISP AXI Bus VeriSilicon Universal Compression Local Bus IO Controller DMA Engine DC (Display) Memory Controller Wireless Connectivity Mixed Signal IPs Company Proprietary and Copyright Khronos Group Page 33
34 Copyright Khronos Group Page 34
Accelerating Vision Processing
Accelerating Vision Processing Neil Trevett Vice President Mobile Ecosystem at NVIDIA President of Khronos and Chair of the OpenCL Working Group SIGGRAPH, July 2016 Copyright Khronos Group 2016 - Page
More informationCopyright Khronos Group Page 1
Gaming Market Briefing Overview of APIs GDC March 2016 Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Copyright
More informationUpdate on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem
Update on Khronos Open Standard APIs for Vision Processing Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page
More informationStandards for Vision Processing and Neural Networks
Copyright Khronos Group 2017 - Page 1 Standards for Vision Processing and Neural Networks Radhakrishna Giduthuri, AMD radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos NNEF Khronos OpenVX
More informationCopyright Khronos Group Page 1
Open Standards and Open Source Together How Khronos APIs Accelerate Fast and Cool Applications Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page
More informationNext Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Copyright Khronos Group Page 1
Next Generation OpenGL Neil Trevett Khronos President NVIDIA VP Mobile Ecosystem @neilt3d Copyright Khronos Group 2015 - Page 1 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon
More informationThe OpenVX Computer Vision and Neural Network Inference
The OpenVX Computer and Neural Network Inference Standard for Portable, Efficient Code Radhakrishna Giduthuri Editor, OpenVX Khronos Group radha.giduthuri@amd.com @RadhaGiduthuri Copyright 2018 Khronos
More informationOpen Standards for AR and VR Neil Trevett Khronos President NVIDIA VP Developer January 2018
Copyright Khronos Group 2018 - Page 1 Open Standards for AR and Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d January 2018 Khronos Mission E.g. OpenGL ES provides
More informationCopyright Khronos Group Page 1
OpenCL A State of the Union Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Vienna, April 2016 Copyright Khronos Group 2016
More informationNavigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015
Copyright Khronos Group 2015 - Page 1 Navigating the Vision API Jungle: Which API Should You Use and Why? Embedded Vision Summit, May 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem
More informationOpen Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018
Copyright Khronos Group 2018 - Page 1 Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc peter.mcguinness@gobrach.com May 2018 Khronos Mission E.g. OpenGL ES provides 3D
More informationCopyright Khronos Group Page 1. Vulkan Overview. June 2015
Copyright Khronos Group 2015 - Page 1 Vulkan Overview June 2015 Copyright Khronos Group 2015 - Page 2 Khronos Connects Software to Silicon Open Consortium creating OPEN STANDARD APIs for hardware acceleration
More informationKhronos Connects Software to Silicon
Press Pre-Briefing GDC 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem All Materials Embargoed Until Tuesday 3 rd March, 12:01AM Pacific Time Copyright Khronos Group 2015 - Page
More informationStandards for WebVR. Neil Trevett. Khronos President Vice President Mobile Content,
Standards for WebVR Neil Trevett Khronos President Vice President Mobile Content, NVIDIA ntrevett@nvidia.com, @neilt3d October 2016 Copyright Khronos Group 2016 - Page 1 Khronos Open Standards Software
More informationPress Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1
Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE,
More informationSIGGRAPH Briefing August 2014
Copyright Khronos Group 2014 - Page 1 SIGGRAPH Briefing August 2014 Neil Trevett VP Mobile Ecosystem, NVIDIA President, Khronos Copyright Khronos Group 2014 - Page 2 Significant Khronos API Ecosystem Advances
More informationPress Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem. Copyright Khronos Group Page 1
Press Briefing SIGGRAPH 2015 Neil Trevett Khronos President NVIDIA Vice President Mobile Ecosystem Copyright Khronos Group 2015 - Page 1 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE,
More informationEcosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer
Ecosystem Overview Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2016 - Page 1 Khronos Mission Software Silicon Khronos is
More informationCopyright Khronos Group Page 1
OpenCL State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Toronto, May 2017 Copyright Khronos Group 2017
More informationCopyright Khronos Group Page 1
OpenCL State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Toronto, May 2017 Copyright Khronos Group 2017
More informationCopyright Khronos Group Page 1
Update on Khronos Standards for Vision and Machine Learning December 2017 Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d www.khronos.org Copyright Khronos Group
More informationStandards Update. Copyright Khronos Group Page 1
Standards Update VR/AR, 3D, Web, Vision and Deep Learning Neil Trevett Khronos President NVIDIA VP Developer Ecosystem ntrevett@nvidia.com @neilt3d www.khronos.org Copyright Khronos Group 2017 - Page 1
More informationOverview and AR/VR Roadmap
Khronos Group Inc. 2018 - Page 1 Overview and AR/ Roadmap Neil Trevett Khronos President NVIDIA VP Developer Ecosystems ntrevett@nvidia.com @neilt3d Khronos Group Inc. 2018 - Page 2 Khronos Connects Software
More informationOpen Standards for Building Virtual and Augmented Realities. Neil Trevett Khronos President NVIDIA VP Developer Ecosystems
Open Standards for Building Virtual and Augmented Realities Neil Trevett Khronos President NVIDIA VP Developer Ecosystems Khronos Mission Asian Members Software Silicon Khronos is an International Industry
More informationOpen API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014
Open API Standards for Mobile Graphics, Compute and Vision Processing GTC, March 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President Khronos Copyright Khronos Group 2014 - Page 1 Khronos
More informationNeural Network Exchange Format
Copyright Khronos Group 2017 - Page 1 Neural Network Exchange Format Deploying Trained Networks to Inference Engines Viktor Gyenes, specification editor Copyright Khronos Group 2017 - Page 2 Outlook The
More informationVulkan Launch Webinar 18 th February Copyright Khronos Group Page 1
Vulkan Launch Webinar 18 th February 2016 Copyright Khronos Group 2016 - Page 1 Copyright Khronos Group 2016 - Page 2 The Vulkan Launch Webinar Is About to Start! Kathleen Mattson - Webinar MC, Khronos
More informationOpenCL Overview. Shanghai March Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group
Copyright Khronos Group, 2012 - Page 1 OpenCL Overview Shanghai March 2012 Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2012 - Page 2 Processor
More informationOpenCL Press Conference
Copyright Khronos Group, 2011 - Page 1 OpenCL Press Conference Tokyo, November 2011 Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2011 - Page
More informationVulkan 1.1 March Copyright Khronos Group Page 1
Vulkan 1.1 March 2018 Copyright Khronos Group 2018 - Page 1 Vulkan 1.1 Launch and Ongoing Momentum Strengthening the Ecosystem Improved developer tools (SDK, validation/debug layers) More rigorous conformance
More informationOpen Standard APIs for Augmented Reality
Copyright Khronos Group 2014 - Page 1 Open Standard APIs for Augmented Reality Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page 2 Khronos
More informationCopyright Khronos Group 2012 Page 1. OpenCL 1.2. August 2012
Copyright Khronos Group 2012 Page 1 OpenCL 1.2 August 2012 Copyright Khronos Group 2012 Page 2 Khronos - Connecting Software to Silicon Khronos defines open, royalty-free standards to access graphics,
More informationAR Standards Update Austin, March 2012
AR Standards Update Austin, March 2012 Neil Trevett President, The Khronos Group Vice President Mobile Content, NVIDIA Copyright Khronos Group, 2012 - Page 1 Topics Very brief overview of Khronos Update
More informationEECS 487: Interactive Computer Graphics
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with
More informationVision Acceleration. Launch Briefing October Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group
Copyright Khronos Group 2014 - Page 1 Vision Acceleration Launch Briefing October 2014 Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page
More informationThe Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016
The Path to Embedded Vision & AI using a Low Power Vision DSP Yair Siegel, Director of Segment Marketing Hotchips August 2016 Presentation Outline Introduction The Need for Embedded Vision & AI Vision
More informationCopyright Khronos Group, Page 1. OpenCL. GDC, March 2010
Copyright Khronos Group, 2011 - Page 1 OpenCL GDC, March 2010 Authoring and accessibility Application Acceleration System Integration Copyright Khronos Group, 2011 - Page 2 Khronos Family of Standards
More informationCopyright Khronos Group Page 1
OpenCL and Ecosystem State of the Nation Neil Trevett Khronos President NVIDIA Vice President Developer Ecosystem OpenCL Working Group Chair ntrevett@nvidia.com @neilt3d Oxford, May 2018 Copyright Khronos
More informationWebGL Meetup GDC Copyright Khronos Group, Page 1
WebGL Meetup GDC 2012 Copyright Khronos Group, 2012 - Page 1 Copyright Khronos Group, 2012 - Page 2 Khronos API Ecosystem Trends Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos
More informationKhronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President,
Copyright Khronos Group 2017 - Page 1 Khronos Updates GDC 2017 Neil Trevett Vice President Developer Ecosystem, NVIDIA President, Khronos ntrevett@nvidia.com @neilt3d Copyright Khronos Group 2017 - Page
More informationNVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU
NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU GPGPU opens the door for co-design HPC, moreover middleware-support embedded system designs to harness the power of GPUaccelerated
More informationMultimedia in Mobile Phones. Architectures and Trends Lund
Multimedia in Mobile Phones Architectures and Trends Lund 091124 Presentation Henrik Ohlsson Contact: henrik.h.ohlsson@stericsson.com Working with multimedia hardware (graphics and displays) at ST- Ericsson
More informationOverview. Technology Details. D/AVE NX Preliminary Product Brief
Overview D/AVE NX is the latest and most powerful addition to the D/AVE family of rendering cores. It is the first IP to bring full OpenGL ES 2.0/3.1 rendering to the FPGA and SoC world. Targeted for graphics
More informationCopyright Khronos Group, Page 1. Khronos Overview. Taiwan, February 2012
Copyright Khronos Group, 2012 - Page 1 Khronos Overview Taiwan, February 2012 Copyright Khronos Group, 2012 - Page 2 Khronos - Connecting Software to Silicon Creating open, royalty-free API standards -
More informationPOWERVR MBX & SGX OpenVG Support and Resources
POWERVR MBX & SGX OpenVG Support and Resources Kristof Beets 3 rd Party Relations Manager - Imagination Technologies kristof.beets@imgtec.com Copyright Khronos Group, 2006 - Page 1 Copyright Khronos Group,
More informationOpen Standard APIs for Embedded Vision Processing
Copyright Khronos Group 2014 - Page 1 Open Standard APIs for Embedded Vision Processing Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group Copyright Khronos Group 2014 - Page
More informationWindowing System on a 3D Pipeline. February 2005
Windowing System on a 3D Pipeline February 2005 Agenda 1.Overview of the 3D pipeline 2.NVIDIA software overview 3.Strengths and challenges with using the 3D pipeline GeForce 6800 220M Transistors April
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationKhronos and the Mobile Ecosystem
Copyright Khronos Group, 2011 - Page 1 Khronos and the Mobile Ecosystem Neil Trevett VP Mobile Content, NVIDIA President, Khronos Copyright Khronos Group, 2011 - Page 2 Topics It s not just about individual
More informationEnabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager
Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the
More informationOpenCL: History & Future. November 20, 2017
Mitglied der Helmholtz-Gemeinschaft OpenCL: History & Future November 20, 2017 OpenCL Portable Heterogeneous Computing 2 APIs and 2 kernel languages C Platform Layer API OpenCL C and C++ kernel language
More informationProfiling and Debugging Games on Mobile Platforms
Profiling and Debugging Games on Mobile Platforms Lorenzo Dal Col Senior Software Engineer, Graphics Tools Gamelab 2013, Barcelona 26 th June 2013 Agenda Introduction to Performance Analysis with ARM DS-5
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationAdding Advanced Shader Features and Handling Fragmentation
Copyright Khronos Group, 2010 - Page 1 Adding Advanced Shader Features and Handling Fragmentation How to enable your application on a wide range of devices Imagination Technologies Copyright Khronos Group,
More informationTechnology and Design Tools for Multicore Embedded Systems Software Development
Technology and Design Tools for Multicore Embedded Systems Software Development Yuriy Sheynin, Alexey Syschikov, Boris Sedov Saint Petersburg State University of Aerospace Instrumentation Why do we need
More informationDeep Learning Requirements for Autonomous Vehicles
Deep Learning Requirements for Autonomous Vehicles Pierre Paulin, Director of R&D Synopsys Inc. Chipex, 1 May 2018 1 Agenda Deep Learning and Convolutional Neural Networks for Embedded Vision Automotive
More informationMali Developer Resources. Kevin Ho ARM Taiwan FAE
Mali Developer Resources Kevin Ho ARM Taiwan FAE ARM Mali Developer Tools Software Development SDKs for OpenGL ES & OpenCL OpenGL ES Emulators Shader Development Studio Shader Library Asset Creation Texture
More informationCSE 591: GPU Programming. Introduction. Entertainment Graphics: Virtual Realism for the Masses. Computer games need to have: Klaus Mueller
Entertainment Graphics: Virtual Realism for the Masses CSE 591: GPU Programming Introduction Computer games need to have: realistic appearance of characters and objects believable and creative shading,
More informationGraph Streaming Processor
Graph Streaming Processor A Next-Generation Computing Architecture Val G. Cook Chief Software Architect Satyaki Koneru Chief Technology Officer Ke Yin Chief Scientist Dinakar Munagala Chief Executive Officer
More informationIntroduction to OpenGL ES 3.0
Introduction to OpenGL ES 3.0 Eisaku Ohbuchi Digital Media Professionals Inc. 2012 Digital Media Professionals Inc. All rights reserved. 12/Sep/2012 Page 1 Agenda DMP overview (quick!) OpenGL ES 3.0 update
More informationMobile AR Hardware Futures
Copyright Khronos Group, 2010 - Page 1 Mobile AR Hardware Futures Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Two Perspectives NVIDIA - Tegra 2 mobile processor Khronos
More informationBifrost - The GPU architecture for next five billion
Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor
More informationOverview. Think Silicon is a privately held company founded in 2007 by the core team of Atmel MMC IC group
Nema An OpenGL & OpenCL Embedded Programmable Engine Georgios Keramidas & Iakovos Stamoulis Think Silicon mobile GRAPHICS Overview Think Silicon is a privately held company founded in 2007 by the core
More informationVulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics.
Vulkan: Architecture positive How Vulkan maps to PowerVR GPUs Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies
More informationDave Shreiner, ARM March 2009
4 th Annual Dave Shreiner, ARM March 2009 Copyright Khronos Group, 2009 - Page 1 Motivation - What s OpenGL ES, and what can it do for me? Overview - Lingo decoder - Overview of the OpenGL ES Pipeline
More informationHSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!
Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2
More informationRenderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs. Lihua Zhang, Ph.D. MulticoreWare Inc.
Renderscript Accelerated Advanced Image and Video Processing on ARM Mali T-600 GPUs Lihua Zhang, Ph.D. MulticoreWare Inc. lihua@multicorewareinc.com Overview More & more mobile apps are beginning to require
More informationStandards for Neural Networks Acceleration and Deployment
Copyright Khronos Group 2017 - Page 1 Standards for Neural Networks Acceleration and Deployment Radhakrishna Giduthuri Advanced Micro Devices radha.giduthuri@ieee.org Agenda Why we need a standard? Khronos
More informationPowerVR GPU IP from Wearables to Servers. Kristof Beets Director of Business Development May 2015
PowerVR GPU IP from Wearables to Servers Kristof Beets Director of Business Development May 2015 www.imgtec.com Expanding embedded GPU market opportunities Huge range of market opportunities equates to
More informationParallel Programming on Larrabee. Tim Foley Intel Corp
Parallel Programming on Larrabee Tim Foley Intel Corp Motivation This morning we talked about abstractions A mental model for GPU architectures Parallel programming models Particular tools and APIs This
More informationWebGL, WebCL and Beyond!
Copyright Khronos Group, 2011 - Page 1 WebGL, WebCL and Beyond! Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2011 - Page 2 Topics in this Session
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationGPGPU on Mobile Devices
GPGPU on Mobile Devices Introduction Addressing GPGPU for very mobile devices Tablets Smartphones Introduction Why dedicated GPUs in mobile devices? Gaming Physics simulation for realistic effects 3D-GUI
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationCSE 591/392: GPU Programming. Introduction. Klaus Mueller. Computer Science Department Stony Brook University
CSE 591/392: GPU Programming Introduction Klaus Mueller Computer Science Department Stony Brook University First: A Big Word of Thanks! to the millions of computer game enthusiasts worldwide Who demand
More informationEnable AI on Mobile Devices
Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationHigher Level Programming Abstractions for FPGAs using OpenCL
Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationCopyright Khronos Group, Page 1 SYCL. SG14, February 2016
Copyright Khronos Group, 2014 - Page 1 SYCL SG14, February 2016 BOARD OF PROMOTERS Over 100 members worldwide any company is welcome to join Copyright Khronos Group 2014 SYCL 1. What is SYCL for and what
More informationHKG OpenCL Support by NNVM & TVM. Jammy Zhou - Linaro
HKG18-417 OpenCL Support by NNVM & TVM Jammy Zhou - Linaro Agenda OpenCL Overview OpenCL in NNVM & TVM Current Status OpenCL Introduction Open Computing Language Open standard maintained by Khronos with
More informationCLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level
CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION
More informationWu Zhiwen.
Wu Zhiwen zhiwen.wu@intel.com Agenda Background information OpenCV DNN module OpenCL acceleration Vulkan backend Sample 2 What is OpenCV? Open Source Compute Vision (OpenCV) library 2500+ Optimized algorithms
More informationWorking with Metal Overview
Graphics and Games #WWDC14 Working with Metal Overview Session 603 Jeremy Sandmel GPU Software 2014 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission
More informationGPGPU. Peter Laurens 1st-year PhD Student, NSC
GPGPU Peter Laurens 1st-year PhD Student, NSC Presentation Overview 1. What is it? 2. What can it do for me? 3. How can I get it to do that? 4. What s the catch? 5. What s the future? What is it? Introducing
More information3D Graphics in Future Mobile Devices. Steve Steele, ARM
3D Graphics in Future Mobile Devices Steve Steele, ARM Market Trends Mobile Computing Market Growth Volume in millions Mobile Computing Market Trends 1600 Smart Mobile Device Shipments (Smartphones and
More informationNeil Trevett Vice President, NVIDIA OpenCL Chair Khronos President. Copyright Khronos Group, Page 1
Neil Trevett Vice President, NVIDIA OpenCL Chair Khronos President Copyright Khronos Group, 2009 - Page 1 Introduction and aims of OpenCL - Neil Trevett, NVIDIA OpenCL Specification walkthrough - Mike
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationOptimizing and Profiling Unity Games for Mobile Platforms. Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June
Optimizing and Profiling Unity Games for Mobile Platforms Angelo Theodorou Senior Software Engineer, MPG Gamelab 2014, 25 th -27 th June 1 Agenda Introduction ARM and the presenter Preliminary knowledge
More informationVulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics
Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility
More informationOpenCL The Open Standard for Heterogeneous Parallel Programming
OpenCL The Open Standard for Heterogeneous Parallel Programming March 2009 Copyright Khronos Group, 2009 - Page 1 Close-to-the-Silicon Standards Khronos creates Foundation-Level acceleration APIs - Needed
More informationWebGL, WebCL and OpenCL
Copyright Khronos Group, 2011 - Page 1 WebGL, WebCL and OpenCL Neil Trevett Vice President Mobile Content, NVIDIA President, The Khronos Group Copyright Khronos Group, 2011 - Page 2 Processor Parallelism
More informationTIOVX TI s OpenVX Implementation
TIOVX TI s OpenVX Implementation Aish Dubey Product Marketing, Automotive Processors Embedded Vision Summit, 3 May 2017 1 TI SOC platform heterogeneous cores High level processing Object detection and
More informationMobile Graphics Ecosystem. Tom Olson OpenGL ES working group chair
OpenGL ES in the Mobile Graphics Ecosystem Tom Olson OpenGL ES working group chair Director, Graphics Research, ARM Ltd 1 Outline Why Mobile Graphics? OpenGL ES Overview Getting Started with OpenGL ES
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationShrinath Shanbhag Senior Software Engineer Microsoft Corporation
Accelerating GPU inferencing with DirectML and DirectX 12 Shrinath Shanbhag Senior Software Engineer Microsoft Corporation Machine Learning Machine learning has become immensely popular over the last decade
More informationOpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017
OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated
More informationCS427 Multicore Architecture and Parallel Computing
CS427 Multicore Architecture and Parallel Computing Lecture 6 GPU Architecture Li Jiang 2014/10/9 1 GPU Scaling A quiet revolution and potential build-up Calculation: 936 GFLOPS vs. 102 GFLOPS Memory Bandwidth:
More informationBringing it all together: The challenge in delivering a complete graphics system architecture. Chris Porthouse
Bringing it all together: The challenge in delivering a complete graphics system architecture Chris Porthouse System Integration & the role of standards Content Ecosystem Java Execution Environment Native
More information