Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications
|
|
- Brittney Lucas
- 6 years ago
- Views:
Transcription
1 Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei
2 Machine Learning is a Subset of Artificial Intelligence AI means many things to many people Artificial Intelligence Machine Learning Perception & Vision Natural Language Processing Knowledge Representation Planning & Navigation Generalized Intelligence ML itself has a lot of depth 3 3
3 Why Artificial Intelligence(AI) Exploding Now Availability of increased data sourced at the edge with ubiquitous powerful compute! Compute 2010 Data zettabyte IP Traffic zettabyte 4 4
4 AI Presents Significant Opportunity for Innovation VR/MR Robotics Drones Shipping & logistics IoT Home, surveillance & analytics Automotive Mobile 5 5
5 Distributing Intelligence Cloud-based training High-performance processing On-device learning Security and privacy for your data AI in your hand & cloud Real-time inference for autonomous systems 6 6
6 Why is On-device ML Driving AI to the Edge? Bandwidth Power Cost Latency Privacy 7 7
7 Arm ML Platform Enables Efficiency Flexibility Freedom 8 8
8 Arm s ML Computing Platform AI Applications Incorporating ML, CV, speech recognition etc. Applications Compute library Neural network frameworks (e.g. TensorFlow, Caffe, AndroidNN) Optional Spirit libraries & model sets Spirit metadata library Stable SW interfaces Arm DS-5 / Keil tools / compilers / drivers CPU CPU GPU Spirit Computer Vision Edge devices SVE 9 9
9 Components of Arm ML Platform Software Specialized Acceleration Hardware 10 10
10 Software Development 11 11
11 Arm Compute Library Faster, advanced processing What is the Arm Compute Library? Functions for CV and deeplearning algorithms Optimized for Arm CPU and GPU OS and platform agnostic No fee, MIT license Delivers faster processing 4.6x faster than stock OpenCV on NEON Offers OpenCV and Open VX compatibility Use as a plug-in backend for your own runtime implementation Available now:
12 Arm Compute Library Partners 13 13
13 Specialized Acceleration 14 14
14 Computer Vision (CV)
15 Spirit for Object Detection and Localization Sensor ISP Spirit CV pre-processor Image stream CPU GPU Sensor interface Feature extraction Classifier 1 Classifier 2 Metadata stream (Regions of interest) Acceleration Beth Ben 16 16
16 Comparison with Neural Network Framework Solutions Yolo Spirit SSD Neural Network 17 17
17 Spirit: Key Features Object localization (>200k locations in full HD) Size variation (~20x for full HD) Scalable to 4K without performance compromise Real time, 60 fps, no dropped frames Invariance to optical distortions Invariance to illumination conditions High occlusion tolerance Suitable for stationary and moving cameras 18 18
18 Comparison with a DSP Spirit uses a form of HOG*/ SVM* baked into an efficient hardware design Using a DSP to achieve the same performance E.g. Pedestrian Detection on a DSP Processed at VGA resolution Achieving 5fps Operating at 40MHz/50MHz Scaling this to Spirit performance levels of 1080p60 would require the DSP to run at 3.24GHz *Histogram of Oriented Gradients *Support Vector Machines
19 ML on Mali GPUs
20 Relative Eneryg Efficency Mali GPUs: Increasing ML Throughput and Efficiency Increasing efficiency SGEMM FP32 Mali-G71 SGEMM FP16 Mali-G72 17% Efficiency gain GEMM depicts core functionality of ML algorithms Mali-G72 has several optimizations to improve ML inference Less power-hungry FMA unit Bigger L1 cache in the execution engine Mali-G72 is the most efficient Mali GPU for machine learning 21 21
21 Hardware 22 22
22 AI Applications at the Edge on Arm Detect plant diseases Sort cucumbers Detect Caltrain delays 23 23
23 Instruction Sets for AI Cortex-A Additional dot product instructions (Cortex-A55 and Cortex-A75) New Scalable Vector Extensions (SVE) instructions Cortex-M Optimized CMSIS-DSP libraries for matrix multiplication Closely-coupled acceleration Improved performance and efficiency (for broader use cases) Flexibility in multi-core computing with Arm DynamIQ technology 25 25
24 DynamIQ: New Cluster Design for New Cores Arm DynamIQ big.little systems: 26 Greater product differentiation and scalability Improved energy efficiency and performance SW compatibility with Energy Aware Scheduling (EAS) Private L2 and shared L3 caches Local cache close to processors L3 cache shared between all cores DynamIQ Shared Unit (DSU) Contains L3, Snoop Control Unit (SCU) and all cluster interfaces Additional instructions for ML 26 SCU ACP Cortex-A75 32b/64b Core Private L2 cache 1b+7L AMBA4 ACE.. Peripheral Port DynamIQ Shared Unit (DSU) 2b+6L 1b+2L 1b+3L Example: DynamIQ big.little configurations Cortex-A55 32b/64b Core Private L2 cache Async Bridges Shared L3 cache 4b+4L 1b+4L
25 New DynamIQ-based CPUs for New Possibilities Cortex-A75 processor >50% more performance compared to current devices Cortex-A55 processor 2.5x higher power efficiency compared to current devices 27 Estimated device performance using SPECINT2006, final device results may vary Comparison using Cortex-A73 at 2.4GHz vs Cortex-A75 at 3GHz 27 Comparison using Cortex-A53 in 28nm devices vs Cortex-A55 in 16nm devices
26 Time (lower is better) MAC/cycle Enhanced Architecture for Emerging Use Cases Computer Vision Machine Learning FP16 FP16 0 Harris Corners (relative to Cortex-A53) Cortex-A53 (FP32) Cortex-A55 (FP32) Cortex-A55(FP16) Cortex-A73 (FP32) Cortex-A75(FP32) Cortex-A75(FP16) General Matrix Multiply (relative to Cortex-A53) Cortex-A53 (FP32) Cortex-A55 (FP16) 8-bit dot product Cortex-A55 (FP32) Cortex-A55 (8-bit) 28 28
27 Introducing the Scalable Vector Extension (SVE) + pred = for (i = 0; i < n; ++i) INDEX i CMPLT n n-2 n-1 n n Gather-load and scatter-store Per-lane predication Predicate-driven loop control and management = + pred = = = Vector partitioning and software-managed speculation Extended floating-point horizontal reductions 29 29
28 Instruction Sets for AI Cortex-A Additional dot product instructions (Cortex-A55 and Cortex-A75) New SVE instructions Cortex-M Optimized CMSIS-DSP libraries for matrix multiplication Closely-coupled acceleration Improved performance and efficiency (for broader use cases) Flexibility in multi-core computing with DynamIQ technology 30 30
29 Relative throughput Software Optimizations: Cortex-M Example Convolution Use of partial im2col to reduce the memory footprint Optimized data dimension layout (Height-Width-Channel) to save im2col overhead Pooling Split into x-pooling and y-pooling instead of window-based 5.1X improvements compared to Caffe-like implementation Activation ReLU: use SIMD within a register, 2.6X improvement compared to Caffe-like implementation Sigmoid and Tanh: use table look-up CIFAR-10 network runtime improvement Conv Pooling ReLU Total Baseline New kernels *Baseline uses CMSIS 1D Conv and Caffe-like Pooling/ReLU The new kernels will be integrated into future versions of CMSIS 4x higher perf 31 31
30 Instruction sets for AI Cortex-A Additional dot product instructions (Cortex-A55 and Cortex-A75) New SVE instructions Cortex-M Optimized CMSIS-DSP libraries for matrix multiplication Closely-coupled acceleration Improved performance and efficiency (for broader use cases) Flexibility in multi-core computing with DynamIQ technology 32 32
31 Interface to Acceleration Logic (1) Configure registers for task (3) Processing completed DynamIQ cluster PP ACP (2) Fetches data from CPU memory (4) Writes result into CPU memory Accelerator (3) Carries out acceleration DynamIQ cluster PP ACP (2) Carries out computation (1) Writes data into CPU memory (4) Reads result from CPU memory or sends data for onward processing I/O agent 33 33
32 Accelerator Flexible Acceleration Platform Closely coupled Independent Off-chip coherent DynamIQ Cluster Level-3 Cache Accelerator Accelerator Accel Cache CCIX XP CMN-600 XP XP Agile System Cache Agile System Cache XP DMC-620 XP DMC-620 XP CCIX 34 34
33 Arm ML Platform Enables Efficiency Flexibility Freedom 35 35
34 Thank You! Danke! Merci! 謝謝! ありがとう! Gracias! Kiitos! 38
35 The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. 39
Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationAccelerating intelligence at the edge for embedded and IoT applications
Accelerating intelligence at the edge for embedded and IoT applications Arm Tech Symposia 2017 Agenda The emergence of intelligence at the edge Requirements for intelligent edge computing IP and technologies
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Stefan Rosinger Director, Product Management Arm Arm TechCon 2017 Agenda Market growth and trends DynamIQ
More informationDeep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm
Deep Learning on Arm Cortex-M Microcontrollers Rod Crawford Director Software Technologies, Arm What is Machine Learning (ML)? Artificial Intelligence Machine Learning Deep Learning Neural Networks Additional
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationThe Changing Face of Edge Compute
The Changing Face of Edge Compute 2018 Arm Limited Alvin Yang Nov 2018 Market trends acceleration of technology deployment 26 years 4 years 100 billion chips shipped 100 billion chips shipped 1 Trillion
More informationArtificial Intelligence Enriched User Experience with ARM Technologies
Artificial Intelligence Enriched User Experience with ARM Technologies Daniel Heo Senior Segment Manager Mobile, BSG, ARM ARM Tech Forum Singapore July 12 th 2017 Global AI survey: the world is ready 71
More informationAdvanced IP solutions enabling the autonomous driving revolution
Advanced IP solutions enabling the autonomous driving revolution Chris Turner Director, Emerging Technology & Strategy, Embedded & Automotive Arm Shanghai, Beijing, Shenzhen Arm Tech Symposia 2017 Agenda
More informationA Secure and Connected Intelligent Future. Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017
A Secure and Connected Intelligent Future 1 2017 Arm Copyright Limited Arm 2017 Ian Smythe Senior Director Marketing, Client Business Arm Tech Symposia 2017 Arm: The Industry s Architecture of Choice 50
More informationBringing Intelligence to Enterprise Storage Drives
Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely
More informationARM instruction sets and CPUs for wide-ranging applications
ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world
More informationDynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks
DynamIQ Processor Designs Using Cortex-A75 & Cortex-A55 for 5G Networks Jeff Maguire Senior Product Manager Infrastructure IP Product Management Arm 2017 Arm Limited Arm Tech Symposia 2017 Agenda 5G networks
More informationEnable AI on Mobile Devices
Enable AI on Mobile Devices Scott Wang 王舒翀 Senior Segment Manager Mobile, BSG ARM Tech Forum 2017 14 th June 2017, Shenzhen AI is moving from core to edge Ubiquitous AI Safe and autonomous Mixed reality
More informationUnleash the DSP performance of Arm Cortex processors
Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology
More informationWAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited
WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Artificial Intelligence Fifth wave Data-driven computing era IoT Generating data 5G 5G Transporting
More informationDynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks
DynamIQ Processor Designs Using Cortex-A75 & Cortex- A55 for 5G Networks 2017 Arm Limited David Koenen Sr. Product Manager, Arm Arm Tech Symposia 2017, Taipei Agenda 5G networks Ecosystem software to support
More informationMachine learning for the Internet of Things
Machine learning for the Internet of Things Chris Shore Director of Embedded Solutions Arm 2018 Arm Limited April 2018 More Intelligence at the Edge Arm Cortex-M Expanding opportunity for the embedded
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for
More informationHow to Build Optimized ML Applications with Arm Software
How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just
More informationBringing Intelligence to Enterprise Storage Drives
Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely
More information2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee
2017 Arm Limited How to design an IoT SoC and get Arm CPU IP for no upfront license fee An enhanced Arm DesignStart Building on a strong foundation Successfully used by 1000s of designers, researchers
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationUsing Virtual Platforms To Improve Software Verification and Validation Efficiency
Using Virtual Platforms To Improve Software Verification and Validation Efficiency Odin Shen Staff FAE Arm Arm Tech Symposia Taiwan 2017 Software complexity and best practices Software Costs Increasing
More informationExploring System Coherency and Maximizing Performance of Mobile Memory Systems
Exploring System Coherency and Maximizing Performance of Mobile Memory Systems Shanghai: William Orme, Strategic Marketing Manager of SSG Beijing & Shenzhen: Mayank Sharma, Product Manager of SSG ARM Tech
More informationMaking progress vs strategy
Making progress vs strategy Ian Thornton, Head of Investor Relations Arm is a subsidiary of 1 Arm update Arm refresher H1 update Increasing revenues and investments Progress vs strategy Arm in servers
More informationA Developer's Guide to Security on Cortex-M based MCUs
A Developer's Guide to Security on Cortex-M based MCUs 2018 Arm Limited Nazir S Arm Tech Symposia India Agenda Why do we need security? Types of attacks and security assessments Introduction to TrustZone
More informationArm s Latest CPU for Laptop-Class Performance
Arm s Latest CPU for Laptop-Class Performance 2018 Arm Limited Aditya Bedi Arm Tech Symposia India Untethered. Connected. Immersive. Innovation continues to drive growth and performance demands on our
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationECE 571 Advanced Microprocessor-Based Design Lecture 22
ECE 571 Advanced Microprocessor-Based Design Lecture 22 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 19 April 2018 HW#11 will be posted Announcements 1 Reading 1 Exploring DynamIQ
More informationBeyond TrustZone PSA Reed Hinkel Senior Manager Embedded Security Market Development
Beyond TrustZone PSA Reed Hinkel Senior Manager Embedded Security Market Development Part1 - PSA Tech Seminars 2017 Agenda Platform Security Architecture Architecture overview Trusted Firmware-M IoT Threat
More informationArm s First-Generation Machine Learning Processor
Arm s First-Generation Machine Learning Processor Ian Bratt 2018 Arm Limited Introducing the Arm Machine Learning (ML) Processor Optimized ground-up architecture for machine learning processing Massive
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationBifrost - The GPU architecture for next five billion
Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor
More informationTIOVX TI s OpenVX Implementation
TIOVX TI s OpenVX Implementation Aish Dubey Product Marketing, Automotive Processors Embedded Vision Summit, 3 May 2017 1 TI SOC platform heterogeneous cores High level processing Object detection and
More informationDEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017
DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE Dennis Lui August 2017 THE RISE OF GPU COMPUTING APPLICATIONS 10 7 10 6 GPU-Computing perf 1.5X per year 1000X by 2025 ALGORITHMS 10 5 1.1X
More informationCompute solutions for mass deployment of autonomy
Compute solutions for mass deployment of autonomy Rod Watt Director of Vehicle Architecture and System Analysis Introduction 2 From inception to now 1990 Joint venture between Acorn Computers and Apple.
More informationAddressing 7nm Arm DynamIQ Cluster Design Challenges Using the Cadence Digital Implementation Flow
Addressing 7nm Arm DynamIQ Cluster Design Challenges Using the Cadence Digital Implementation Flow Shawn Hung Sr. Engineering Manager, Arm Jerry Chen Sr. AE Manager, Cadence Arm Tech Symposia 2017, Taipei
More informationConnect Your IoT Device: Bluetooth 5, , NB-IoT
Connect Your IoT Device: Bluetooth 5, 802.15.4, NB-IoT Craig Tou Business Development Manager, Arm Arm Tech Symposia 2017, Taipei IoT Devices - Everything Connects New classes of connectivity for a new
More informationThe Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016
The Path to Embedded Vision & AI using a Low Power Vision DSP Yair Siegel, Director of Segment Marketing Hotchips August 2016 Presentation Outline Introduction The Need for Embedded Vision & AI Vision
More informationImplementing debug. and trace access. through functional I/O. Alvin Yang Staff FAE. Arm Tech Symposia Arm Limited
Implementing debug and trace access through functional I/O Alvin Yang Staff FAE Arm Tech Symposia 2017 Agenda Debug and trace access limitations A new approach Protocol based Bare metal vs mission mode
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationArm crossplatform. VI-HPS platform October 16, Arm Limited
Arm crossplatform tools VI-HPS platform October 16, 2018 An introduction to Arm Arm is the world's leading semiconductor intellectual property supplier We license to over 350 partners: present in 95% of
More informationBuilding firmware update: The devil is in the details
Building firmware update: The devil is in the details Atsushi Haruta, IoT Services Group, Arm Arm Tech Symposia Japan 2017 Arm Mbed: Secure device management Application Cloud Mbed Cloud Secure, scalable,
More informationDesigning, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems
Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software
More informationNew ARMv8-R technology for real-time control in safetyrelated
New ARMv8-R technology for real-time control in safetyrelated applications James Scobie Product manager ARM Technical Symposium China: Automotive, Industrial & Functional Safety October 31 st 2016 November
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationA NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017
A NEW COMPUTING ERA JENSEN HUANG, FOUNDER & CEO GTC CHINA 2017 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 6 10 5 1.1X per year 10 4 10 3 10 2 1.5X per year Single-threaded
More informationEmerging Vision Technologies: Enabling a New Era of Intelligent Devices
Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Computer vision overview Computer vision is being integrated in our daily lives Acquiring, processing, and understanding visual data
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationArm TrustZone Armv8-M Primer
Arm TrustZone Armv8-M Primer Odin Shen Staff FAE Arm Arm Techcon 2017 Security Security technologies review Application Level Security Designed with security in mind: authentication and encryption Privilege
More informationMali-G72 Enabling tomorrow s technology today
Mali-G72 Enabling tomorrow s technology today Alan Tsai Senior Regional Marketing Manager Media Processing Group, ARM ARM Tech Forum Taipei July 4 th 2017 Mali High Performance GPU success 2 Mali-G71 in
More informationEmbedded GPGPU and Deep Learning for Industrial Market
Embedded GPGPU and Deep Learning for Industrial Market Author: Dan Mor GPGPU and HPEC Product Line Manager September 2018 Table of Contents 1. INTRODUCTION... 3 2. DIFFICULTIES IN CURRENT EMBEDDED INDUSTRIAL
More informationBuilding Ultra-Low Power Wearable SoCs
Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from
More informationARM processors driving automotive innovation
ARM processors driving automotive innovation Chris Turner Director of advanced technology marketing, CPU group ARM tech forums, Seoul and Taipei June/July 2016 The ultimate intelligent connected device
More informationEvolving IP configurability and the need for intelligent IP configuration
Evolving IP configurability and the need for intelligent IP configuration Mayank Sharma Product Manager ARM Tech Symposia India December 7 th 2016 Increasing IP integration costs per node $140 $120 $M
More informationTEXAS INSTRUMENTS DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS GOES HERE
YOUR TEXAS INSTRUMENTS VIDEO TITLE DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS OVERVIEW THE SUBTITLE GOES HERE Texas Instruments Deep Learning (TIDL) for Sitara Processors Overview Texas Instruments
More informationTEGRA K1 AND THE AUTOMOTIVE INDUSTRY. Gernot Ziegler, Timo Stich
TEGRA K1 AND THE AUTOMOTIVE INDUSTRY Gernot Ziegler, Timo Stich Previously: Tegra in Automotive Infotainment / Navigation Digital Instrument Cluster Passenger Entertainment TEGRA K1 with Kepler GPU GPU:
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationIntelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems
Intelligent Interconnect for Autonomous Vehicle SoCs Sam Wong / Chi Peng, NetSpeed Systems Challenges Facing Autonomous Vehicles Exploding Performance Requirements Real-Time Processing of Sensors Ultra-High
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationAccelerate Ceph By SPDK on AArch64
Accelerate Ceph By on AArch64 Jun He, jun.he@arm.com 2018 Arm Limited Tone Zhang, tone.zhang@arm.com 2018/3/9 2018 Arm Limited What s? Storage Performance Development Kit A set of tools and libraries to
More informationNew Approaches to Connected Device Security
New Approaches to Connected Device Security Erik Jacobson Architecture Marketing Director Arm Arm Techcon 2017 - If you connect it to the Internet, someone will try to hack it. - If what you put on the
More informationA backward glance and a forward view
Arm Limited is a subsidiary of A backward glance and a forward view Ian Thornton, Head of Investor Relations Tokyo 18 May 2018 Arm update A Backward Glance: Progress in 2017 Financials Investments / hiring
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationConnect your IoT device: Bluetooth 5, , NB-IoT
Connect your IoT device: Bluetooth 5, 802.15.4, NB-IoT Prithi Ramakrishnan Arm TechTalk 2017 IoT connectivity technologies Multiple standards, different applications Throughput Unlicensed >100Mbps Wi-Fi
More informationA NEW COMPUTING ERA. Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA
A NEW COMPUTING ERA Shanker Trivedi Senior Vice President Enterprise Business at NVIDIA THE ERA OF AI AI CLOUD MOBILE PC 2 TWO FORCES DRIVING THE FUTURE OF COMPUTING 10 7 Transistors (thousands) 10 5 1.1X
More informationThe Benefits of GPU Compute on ARM Mali GPUs
The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >
More informationXilinx ML Suite Overview
Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame
More informationINTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES. Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.
INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp. Computer Vision in Mobile Tegra K1 It s time! AGENDA Use cases categories
More informationBeyond TrustZone Security Enclaves Reed Hinkel Senior Manager Embedded Security Market Develop
Beyond TrustZone Security Enclaves Reed Hinkel Senior Manager Embedded Security Market Develop Part2 Security Enclaves Tech Seminars 2017 Agenda New security technology for IoT Security Enclaves CryptoIsland
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationTHE NVIDIA DEEP LEARNING ACCELERATOR
THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural
More informationInference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA
Inference Optimization Using TensorRT with Use Cases Jack Han / 한재근 Solutions Architect NVIDIA Search Image NLP Maps TensorRT 4 Adoption Use Cases Speech Video AI Inference is exploding 1 Billion Videos
More informationEmbedded Vision Solutions
FLEXIBLE SOLUTIONS FOR EMBEDDED VISION PROCESSING AT THE EDGE Embedded Vision Solutions Embedded vision offers a promising future with many exciting new applications entering the market. These systems
More informationResearch Faculty Summit Systems Fueling future disruptions
Research Faculty Summit 2018 Systems Fueling future disruptions Wolong: A Back-end Optimizer for Deep Learning Computation Jilong Xue Researcher, Microsoft Research Asia System Challenge in Deep Learning
More informationModeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces
Modeling Performance Use Cases with Traffic Profiles Over ARM AMBA Interfaces Li Chen, Staff AE Cadence China Agenda Performance Challenges Current Approaches Traffic Profiles Intro Traffic Profiles Implementation
More informationEFFICIENT INFERENCE WITH TENSORRT. Han Vanholder
EFFICIENT INFERENCE WITH TENSORRT Han Vanholder AI INFERENCING IS EXPLODING 2 Trillion Messages Per Day On LinkedIn 500M Daily active users of iflytek 140 Billion Words Per Day Translated by Google 60
More informationAutonomous Driving Solutions
Autonomous Driving Solutions Oct, 2017 DrivePX2 & DriveWorks Marcus Oh (moh@nvidia.com) Sr. Solution Architect, NVIDIA This work is licensed under a Creative Commons Attribution-Share Alike 4.0 (CC BY-SA
More informationARM Performance Libraries Current and future interests
ARM Performance Libraries Current and future interests Chris Goodyer Senior Engineering Manager, HPC Software Workshop on Batched, Reproducible, and Reduced Precision BLAS 25 th February 2017 ARM Performance
More informationHardware- Software Co-design at Arm GPUs
Hardware- Software Co-design at Arm GPUs Johan Grönqvist MCC 2017 - Uppsala About Arm Arm Mali GPUs: The World s #1 Shipping Graphics Processor 151 Total Mali licenses 21 Mali video and display licenses
More informationMaking Sense of Artificial Intelligence: A Practical Guide
Making Sense of Artificial Intelligence: A Practical Guide JEDEC Mobile & IOT Forum Copyright 2018 Young Paik, Samsung Senior Director Product Planning Disclaimer This presentation and/or accompanying
More informationAnalyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components
Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding
More informationArm Mbed Edge. Shiv Ramamurthi Arm. Arm Tech Symposia Arm Limited
Arm Mbed Edge Shiv Ramamurthi Arm Arm Tech Symposia 2017 IoT increasing efficiency, yield, and convenience Commercial buildings Better energy & space utilization Precision farming and connected sites Increased
More informationEnabling a Richer Multimedia Experience with GPU Compute. Roberto Mijat Visual Computing Marketing Manager
Enabling a Richer Multimedia Experience with GPU Compute Roberto Mijat Visual Computing Marketing Manager 1 What is GPU Compute Operating System and most application processing continue to reside on the
More informationDesign guidelines for embedded real time face detection application
Design guidelines for embedded real time face detection application White paper for Embedded Vision Alliance By Eldad Melamed Much like the human visual system, embedded computer vision systems perform
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationThe Next Steps in the Evolution of ARM Cortex-M
The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationNvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018
Nvidia Jetson TX2 and its Software Toolset João Fernandes 2017/2018 In this presentation Nvidia Jetson TX2: Hardware Nvidia Jetson TX2: Software Machine Learning: Neural Networks Convolutional Neural Networks
More informationThe Next Steps in the Evolution of Embedded Processors
The Next Steps in the Evolution of Embedded Processors Terry Kim Staff FAE, ARM Korea ARM Tech Forum Singapore July 12 th 2017 Cortex-M Processors Serving Connected Applications Energy grid Automotive
More informationUnified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine
More informationIndex. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,
Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110
More informationThroughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu
More informationRecurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks
Deep neural networks have enabled major advances in machine learning and AI Computer vision Language translation Speech recognition Question answering And more Problem: DNNs are challenging to serve and
More informationOpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017
OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017 Agenda Why Zynq SoCs for Traditional Computer Vision Automated
More information