TEXAS INSTRUMENTS DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS GOES HERE

Similar documents
Introducing the AM57x Sitara Processors from Texas Instruments

Introduction to Sitara AM437x Processors

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

An introduction to Machine Learning silicon

USB System Design in Sitara Devices Using Linux. [Part 1]: Design USB Hardware (Digital-only Focused) Bin Liu (EP, Processors)

OpenCV on Zynq: Accelerating 4k60 Dense Optical Flow and Stereo Vision. Kamran Khan, Product Manager, Software Acceleration and Libraries July 2017

Wu Zhiwen.

Bring Intelligence to the Edge with Intel Movidius Neural Compute Stick

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software

Snapdragon NPE Overview

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

TIOVX TI s OpenVX Implementation

MYC-C437X CPU Module

Bringing Intelligence to Enterprise Storage Drives

Enable AI on Mobile Devices

Neural Network Exchange Format

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

Small is the New Big: Data Analytics on the Edge

Welcome. Altera Technology Roadshow 2013

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

MYD-C437X-PRU Development Board

Accelerating your Embedded Vision / Machine Learning design with the revision Stack. Giles Peckham, Xilinx

Introduction to AM5K2Ex/66AK2Ex Processors

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models

NVIDIA GPU CLOUD DEEP LEARNING FRAMEWORKS

Hello Edge: Keyword Spotting on Microcontrollers

Xilinx ML Suite Overview

Bringing Intelligence to Enterprise Storage Drives

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

Deep Learning Requirements for Autonomous Vehicles

Movidius Neural Compute Stick

Machine learning for the Internet of Things

Machine Learning 13. week

PRU Hardware Overview. Building Blocks for PRU Development: Module 1

Introduction to Deep Learning in Signal Processing & Communications with MATLAB

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

THE NVIDIA DEEP LEARNING ACCELERATOR

How to add Industrial Ethernet to Computer Numeric Control (CNC) Router Machine

The OpenVX Computer Vision and Neural Network Inference

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Achieving on Mobile Devices

ABC-CNN: Attention Based CNN for Visual Question Answering

CafeGPI. Single-Sided Communication for Scalable Deep Learning

Versal: AI Engine & Programming Environment

Deploying Deep Learning Networks to Embedded GPUs and CPUs

LINARO CONNECT 23 HKG18 George Grey, Linaro CEO

Leverage Vybrid's asymmetrical multicore architecture for real-time applications by Stefan Agner

OpenCL TM & OpenMP Offload on Sitara TM AM57x Processors

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Revolutionizing the Datacenter

Building and Running Inter-Processor Communication (IPC) Examples on the AM572x GP EVM. Sahin Okur Embedded Processor Catalog Applications

ARM+DSP - a winning combination on Qseven

Speculations about Computer Architecture in Next Three Years. Jan. 20, 2018

Getting started with Caffe. Jon Barker, Solutions Architect

Research Faculty Summit Systems Fueling future disruptions

EFFICIENT INFERENCE WITH TENSORRT. Han Vanholder

Accelerate Deep Learning Inference with openvino toolkit

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

ESE: Efficient Speech Recognition Engine for Sparse LSTM on FPGA

Simplify rich applications & hard real-time

How to Estimate the Energy Consumption of Deep Neural Networks

The Power of Speech: Supporting Voice- Driven Commands in Small, Low-Power. Microcontrollers

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

splittest.com page 2 / 5

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

借助 SDSoC 快速開發複雜的嵌入式應用

Brainchip OCTOBER

Hands-on with the Sitara Linux SDK

IoT Market: Three Classes of Devices

TINY System Ultra-Low Power Sensor Hub for Always-on Context Features

KeyStone II. CorePac Overview

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Simplify System Complexity

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

Artificial Intelligence Enriched User Experience with ARM Technologies

World s most advanced data center accelerator for PCIe-based servers

ST Smart Grid Products. Application overview September 2018

SOFTWARE HARDWARE CODESIGN ACCELERATION FOR EFFICIENT NEURAL NETWORK. ...Deep learning and neural

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

The Benefits of GPU Compute on ARM Mali GPUs

AM57x, 66AK2Gx processors for Space/Avionics/Defense designs. Texas Instruments Catalog Processors June 2017

FPGA 加速机器学习应用. 罗霖 2017 年 6 月 20 日

Enabling Safe, Secure, Smarter Cars from Silicon to Software. Jeff Hutton Synopsys Automotive Business Development

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Simplify System Complexity

NVIDIA DEEP LEARNING INSTITUTE

Asynchronous Parallel Learning for Neural Networks and Structured Models with Dense Features

Milwaukee Tech Day - March 30 th, 2017

Machine Learning on VMware vsphere with NVIDIA GPUs

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

Accelerating Implementation of Low Power Artificial Intelligence at the Edge

MYD-C7Z010/20 Development Board

Transcription:

YOUR TEXAS INSTRUMENTS VIDEO TITLE DEEP LEARNING (TIDL) GOES HERE FOR SITARA PROCESSORS OVERVIEW THE SUBTITLE GOES HERE

Texas Instruments Deep Learning (TIDL) for Sitara Processors Overview

Texas Instruments Deep Learning (TIDL) for Sitara Processors: Agenda AM57x SoC (System on Chip), AM5749 SoC, and Embedded Vision Engine (EVE) Subsystem TI Deep Learning (TIDL) development flow TIDL on AM57x SoC Validated framework and network models TIDL example use cases TI Design on TIDL: TIDEP-01004

AM57x Cortex-A15 Processor Highest integrated, generally available ARM Cortex-A15 in the market Benefits: Unmatched performance in the class Scalable family with pin-compatible single and dual core devices with single Processor SDK for all Sitara devices AM57x Processor Family: SW-compatible AM5746 2x A15 1.5GHz 2x C66x 750MHz AM5748 2x A15 1.5GHz 2x C66x 750MHz 2x 3D, 2D, HW Video AM5749 2x A15 1.5GHz 2x C66x 750MHz 2x 3D, 2D, HW Video 2x EVE-DLA AM5726 2x A15 1.5GHz 2x C66x 750MHz AM5728 2x A15 1.5GHz 2x C66x 750MHz 2x 3D, 2D, HW Video Software & development tools: Processor SDK with LTS Linux TIDL supported by the SDK TI GP EVM and Industrial Development Kit (IDK) available. Performance AM5716 1x A15 1.5GHz/500MHz 1x C66x 750MHz/500MHz AM5706 1x A15 1GHz/500MHz 1x C66x 750/ 500MHz Multimedia AM5718 1x A15 1.5GHz 1x C66x 750MHz 1x 3D, 2D, HW Video AM5708 1x A15 1GHz 1x C66x 750MHz 1x 3D, 2D, HW Video

Embedded Vision Engine (EVE) 32-bit RISC processor (ARP32) Deep Learning Accelerator (DLA) 512-bit Vector CO-Processor (VCOP) 16x 16-bit MAC/cycle 10.4 GMAC/sec (or 20.8 GOP/sec) per EVE subsystem at 650 MHz AM5749 2x A15 1.5GHz 2x C66x 750MHz 2x 3D, 2D, HW Video 2x EVE-DLA AM5749 EVE supports 20.8 GMAC/sec at 650 MHz

TI Deep Learning (TIDL) development flow Training Inference Trained Network Deep Learning/CNN training Caffe* *Only *Only certain certain models models are supported. supported. There There are are constraints constraints on on supported supported layers layers that that must must be met. be met. This is not a universal translator. * TensorFlow* Caffe-Jacinto * Training (Desktop/Cloud) Device translation tool TIDL Import Translation (Desktop Linux/Arm Linux) TIDL-Inference TIDL_API OpenCL TIDL Library Inference embedded deployment (Sitara AM57x)

TI Deep Learning (TIDL) on AM57x SoC TIDL is set of software components that enables a Deep Learning inference at the edge. TIDL is available for free as part of Processor SDK Linux. TIDL runs on all AM57x devices: Runs on EVE subsystem and/or C66x DSP cores AM5749 is the highest performance TIDL device (2x EVE) currently available from TI. NOTE: TIDL runs 1.5x-4x faster on EVE subsystems and consumes less power compared to C66x Fully loaded 2x EVE consumes 220 mw compared to 2x C66x cores which consumes 520 mw. The initial TIDL release: Supports Convolution Neural Networks (CNN) for spatial input data (video, images, ToF, Radar, etc.) Does not support Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) model used for time-variant data (speech, machine data, etc.) NOTE: Support for RNN/LSTM/GRU is planned for future releases TIDL APIs are used for programming of EVE and DSP cores.

TI Deep Learning (TIDL) network models TIDL provides several TI network models: Object classification Object detection Pixel-level semantic segmentation TI s network models and TIDL leverage the following for embedded performance optimization: Efficient CNN configurations/network pruning Sparsity Fixed-point dynamic quantization Classification Classification + Detection Semantic Segmentation These tools result in major computational and bandwidth reductions with minor decrease in accuracy. Cat

Validated framework and network models Framework Network Comment BVLC-Caffe * SqueezeNet 1.1 Supports floating-point and on-the-fly 8-bit quantized model TensorFlow * InceptionNet V1 (GoogleNet) Mobilenet 1.0 Supports floating-point and on-the-fly 8-bit quantized model JacintoNet11 Object classification network CAT Caffe-Jacinto JDetNet SSD-based object detection JSegNet21 Pixel-wise semantic segmentation * Only certain neural network models are supported with BVLC-Caffe and TensorFlow frameworks. Certain constraints on layer parameters must be met in order for TI device translator tool to work on it. Details: http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/foundational_components_tidl.html

TI Deep Learning (TIDL) example use cases TIDL Library 5.0 and later Spatial 2D data (RGB sensor, ToF, radar/mmwave, etc.) Classification + Localization Object detection Semantic Segmentation Example Applications Vision computers, optical inspection, semiconductor test and manufacturing Building automation Automated Sorting Equipment, industrial robots ATMs, currency counters Vacuum robots, robotic lawn mowers Smart appliances Role of deep learning Classify products as good or defective. Deep learning can improve accuracy and flexibility. Can track/identify/count people and objects Ability to recognize objects and guide movement/placement Determine valid/counterfeit currency Improve ability to recognize obstacles and things like power cords/wires. Recognize objects in refrigerator, determine food & auto cook

TIDL TI Design: TIDEP-01004 Features Embedded deep learning inference on AM57x SoC Performance scalable TI deep learning library (TIDL library) on AM57x using C66x DSP only, EVE only, or DSP + EVE. Performance optimized reference CNN models for object classification, detection and pixel-level semantic segmentation. Walk-through of TIDL development flow: training, import and deployment. Benchmarks of several popular deep learning networks on AM5749 This reference design is tested and includes TIDL library on DSP and EVE, reference CNN models and Getting Started Guide. Applications Machine Vision: Vision Computer, Code Readers Automated Machinery: Automated Sorting Equipment, Optical Inspection EPOS: ATMs, Currency Counter Logistics Robots: Logistics Robots CPU board Many others Tools & Resources Benefits Guide through dense and sparse model training, importing and deploying the model for inference run on AM57x SoC. Highly integrated solution bringing deep learning inference to the edge. Example trained networks ready to run TIDL API to run multiple (same or different) networks in parallel on different EVE subsystem and DSP cores. Design Guide Design Files: Schematics, BOM, Gerbers, Software, etc. Device Datasheets: AM5749 IDK (not yet public) AM5749 SoC (not yet public)

For more information Sitara AM57x Processors: http://www.ti.com/am57x TI Deep Learning on Processor SDK Linux: http://software-dl.ti.com/processor-sdklinux/esd/docs/latest/linux/foundational_components_tidl.html TIDL API: http://downloads.ti.com/mctools/esd/docs/tidl-api/intro.html Caffe-Jacinto training framework: https://github.com/tidsp/caffe-jacinto Caffe Models trained by TI: https://github.com/tidsp/caffe-jacinto-models For questions regarding topics covered in this training, visit the support forums at the TI E2E Community website.

TI Information Selective Disclosure