Brainchip OCTOBER

Similar documents
BrainChip September 2018 Quarter Update. Brainchip OCTOBER

Louis DiNardo. Presented April 2018 ASX CODE: BRN AN AI PROCESSOR COMPANY MARCH BrainChip MARCH

Louis DiNardo. Ryan Benton Chief Financial Officer. September Quarter Update. Presented November 2017 ASX CODE: BRN AN AI PROCESSOR COMPANY

Characterization and Benchmarking of Deep Learning. Natalia Vassilieva, PhD Sr. Research Manager

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Low-Power Neural Processor for Embedded Human and Face detection

Fast Hardware For AI

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Revolutionizing the Datacenter

An introduction to Machine Learning silicon

Computer Architectures for Deep Learning. Ethan Dell and Daniyal Iqbal

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Nvidia Jetson TX2 and its Software Toolset. João Fernandes 2017/2018

DEEP NEURAL NETWORKS CHANGING THE AUTONOMOUS VEHICLE LANDSCAPE. Dennis Lui August 2017

Emergence of the Memory Centric Architectures

In partnership with. VelocityAI REFERENCE ARCHITECTURE WHITE PAPER

Object Detection Lecture Introduction to deep learning (CNN) Idar Dyrdal

The Path to Embedded Vision & AI using a Low Power Vision DSP. Yair Siegel, Director of Segment Marketing Hotchips August 2016

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Software Defined Hardware

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Recurrent Neural Networks. Deep neural networks have enabled major advances in machine learning and AI. Convolutional Neural Networks

AI Requires Many Approaches

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS. Exploring Convolutional Neural Networks on the

Inference Optimization Using TensorRT with Use Cases. Jack Han / 한재근 Solutions Architect NVIDIA

How to Estimate the Energy Consumption of Deep Neural Networks

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

XPU A Programmable FPGA Accelerator for Diverse Workloads

Neural Computer Architectures

direct hardware mapping of cnns on fpga-based smart cameras

Implementation of Deep Convolutional Neural Net on a Digital Signal Processor

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations, and Hardware Implications

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

Deep Learning Requirements for Autonomous Vehicles

Emerging computing paradigms: The case of neuromorphic platforms

NVIDIA FOR DEEP LEARNING. Bill Veenhuis

Deep Learning Accelerators

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

A new Computer Vision Processor Chip Design for automotive ADAS CNN applications in 22nm FDSOI based on Cadence VP6 Technology

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Neuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi

Adaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

Fuzzy Set Theory in Computer Vision: Example 3

High-Throughput and High-Accuracy Classification with Convolutional Ternary Neural Networks. Frédéric Pétrot, Adrien Prost-Boucle, Alban Bourge

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Deep Learning Processing Technologies for Embedded Systems. October 2018

Deep Learning on Arm Cortex-M Microcontrollers. Rod Crawford Director Software Technologies, Arm

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

THE NVIDIA DEEP LEARNING ACCELERATOR

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

The OpenVX Computer Vision and Neural Network Inference

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

INTRODUCTION TO DEEP LEARNING

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler

Master Informatics Eng.

Defense Data Generation in Distributed Deep Learning System Se-Yoon Oh / ADD-IDAR

! References: ! Computer eyesight gets a lot more accurate, NY Times. ! Stanford CS 231n. ! Christopher Olah s blog. ! Take ECS 174!

Deep Learning with Tensorflow AlexNet

Jacek Czaja, Machine Learning Engineer, AI Product Group

Two FPGA-DNN Projects: 1. Low Latency Multi-Layer Perceptrons using FPGAs 2. Acceleration of CNN Training on FPGA-based Clusters

HETEROGENEOUS COMPUTE INFRASTRUCTURE FOR SINGAPORE

Profiling the Performance of Binarized Neural Networks. Daniel Lerner, Jared Pierce, Blake Wetherton, Jialiang Zhang

Creating Affordable and Reliable Autonomous Vehicle Systems

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Neuromorphic Data Microscope

Xilinx ML Suite Overview

IoT Market: Three Classes of Devices

Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System

Exploration of dynamic communication networks for neuromorphic computing

Contents PART I: CLOUD, BIG DATA, AND COGNITIVE COMPUTING 1

Machine Learning on VMware vsphere with NVIDIA GPUs

Embarquez votre Intelligence Artificielle (IA) sur CPU, GPU et FPGA

Open Standards for Vision and AI Peter McGuinness NNEF WG Chair CEO, Highwai, Inc May 2018

In-memory computing with emerging memory devices

Enabling the future of Artificial intelligence

Enabling Technology for the Cloud and AI One Size Fits All?

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

Small is the New Big: Data Analytics on the Edge

VISION FOR AUTOMOTIVE DRIVING

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

Adaptable Intelligence The Next Computing Era

ENERGY CHALLENGES OF COMPUTING FOR CPS SYSTEMS

Versal: AI Engine & Programming Environment

DEEP NEURAL NETWORKS AND GPUS. Julie Bernauer

Flow-based Anomaly Intrusion Detection System Using Neural Network

Experiments with Tensor Flow

LOW LATENCY DATA DISTRIBUTION IN CAPITAL MARKETS: GETTING IT RIGHT

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

Enable AI on Mobile Devices

Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip

A Lightweight YOLOv2:

Neuro-inspired Computing Systems & Applications

Transcription:

Brainchip OCTOBER 2017 1

Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2

Neuromorphic Computing Background Brainchip OCTOBER 2017 3

A Brief History of Neuromorphic Computing Brainchip OCTOBER 2017 4

Semiconductor Compute Architecture Cycles CPU/MPU/GPU Artificial Intelligence Acceleration Disruption Architectural Von Neumann Harvard Multiplicity of ISAs Multiplicity of Vendors Multiplicity of accelerators FPU GPU DSP 1990 AlexNet wins Imagenet Challenge 2012 Acceleration Convolutions Spiking Architecture VLIW Array Memory Datatype Floating Fixed Binary Consolidation 1971 Intel 4004 Introduced X86/RISC GPU FPGA Brainchip OCTOBER 2017 5

The Next Major Semiconductor Disruption $60B opportunity in next decade Training is important, but inference is the major market $M 70,000 60,000 50,000 40,000 30,000 AI Acceleration Chipset Forecast Training Inference General Purpose 20,000 Machine learning requires 10,000 0 dedicated acceleration 2018 2019 2020 2021 2022 2023 2024 2025 Source: Tractica Deep Learning Chipsets, Q2 2018 Brainchip OCTOBER 2017 6

Explosion of AI Acceleration Software Simulation of ANNs Neuromorphic Computing X86 CPU Convolutional Neural Networks X86 CPU Cloud Acceleration Edge Acceleration Re-Purposed Hardware Acceleration Customized Acceleration Google TPU TrueNorth Test Chip Loihi Test Chip Brainchip + Internal ASIC Development OCTOBER 2017 7

Traditional CPU Architecture Inefficient for ANNs Traditional Compute Architecture Artificial Neural Network Architecture Memory Control unit Arithmetic logic unit input output PROCESSOR ACCUMULATOR Optimal for sequential execution Distributed, parallel, feed-forward Brainchip OCTOBER 2017 8

ANN Differences Primary Compute Function Spiking Neural Network Convolutional Neural Network Synapses Reinforced connections Neurons Inhibited connections Spikes Linear Algebra Matrix Multiplication Brainchip OCTOBER 2017 9

Neural Network Comparison Convolutional Neural Networks Spiking Neural Networks Characteristic Result Characteristic Result Computational functions Matrix Multiplication, ReLU, Pooling, FC layers Math intensive, high power, custom acceleration blocks Threshold logic, connection reinforcement Math-light, low power, standard logic Training Backpropagation offchip Requires large prelabeled datasets, long and expensive training periods Feed-Forward, on or off-chip Short training cycles, continuous learning Math intensive cloud compute Low power edge deployments Brainchip OCTOBER 2017 10

Previous Neuromorphic Computing Programs Primarily research programs Investigating neuron simulation 1,000 s of ways to emulate spiking neurons Investigating training methods Academia or government programs SpiNNaker (Human Brain Project) IBM TrueNorth (DARPA) Neurogrid (Stanford) Intel Loihi test chip Brainchip OCTOBER 2017 11

Brainchip OCTOBER 2017 12

Culmination of Decades of Development Brainchip OCTOBER 2017 13

World s first Neuromorphic System on Chip (NSoC) Efficient neuron model Innovative training methodologies Everything required for embedded/edge applications On-chip processor Data->spike conversion Scalable for Server/Cloud Neuromorphic computing for multiple markets Vision systems Cyber security Financial tech Brainchip OCTOBER 2017 14

Akida NSoC Architecture Brainchip OCTOBER 2017 15

Akida Neuron Fabric Most efficient spiking neural network implementation 1.2M Neurons 10B Synapses Able to replicate most CNN functionality Convolution Pooling Fully connected Meets demanding performance criteria 1,100 fps CIFAR-10 82% accuracy Right-Sized for embedded applications 10 classifiers (CIFAR 10) 11 Layers 517K Neurons 616M Synapses Brainchip OCTOBER 2017 16

Neuron and Synapse Counts in the Animal Kingdom Brainchip OCTOBER 2017 17

The Most Efficient Neuromorphic Computing Fabric Relative Implementation Efficiency (Neurons and Synapses) 3X 300X Keys to efficiency Fixed neuron model Right-sized Synapses minimized on-chip RAM 6MB compared to 30-50MB Programmable training and firing thresholds Flexible neural processor cores Highly optimized to perform convolutions Also fully connected, pooling Efficient connectivity Global spike bus connects all neural processors Multi-chip expandable to 1.2 Billion neurons Brainchip OCTOBER 2017 18

Neuromorphic Computing Benefits Top-1 Accuracy 79% Cifar-10 Intel Myriad 2 18 fps/w ~$10 Cifar-10 BrainChip Akida 82% ~$10 83% 1.4K fps/w 80% Cifar-10 IBM TrueNorth 6K fps/w Cifar-10 Xilinx ZC709 6K fps/w ~$1,000 ~$1,000 Tremendous throughput with low power Math-lite, no MACs No DRAM access for weights Comparable accuracy Optimized synapses and neurons ensures precision GoogLeNet Intel Myriad 2 69% ~$10 69% GoogLeNet Tegra TX2 ~$300 4.2 fps/w 15 fps/w Frames per Second/watt Brainchip OCTOBER 2017 19 Note: For comparison purposes only. Data and pricing are estimated and subject to change

Akida NSoC Applications Brainchip OCTOBER 2017 20

Vision Applications: Object Classification Complete embedded solution Flexible for multiple data types <1 Watt On-chip training available for continuous learning Lidar Pixel DVS Ultrasound Sensor Interfaces Conversion Complex SNN Model Object Classification Data Interfaces Neuron Fabric Data Interfaces Brainchip OCTOBER 2017 21

Financial Technology Applications: Fintech Data Analysis CPU Fintech Data Data Interfaces Fintech data distinguishing parameters for stock characteristics and trading information, can be converted to spikes in SW on CPU or by Akida NSoC Conversion Complex SNN Model Pattern Recognition Neuron Fabric Unsupervised learning on chip to detect repeating patterns (Clustering) These trading patterns and clusters can then be analyzed for effectiveness Brainchip OCTOBER 2017 22

Cybersecurity Applications: Malware Detection CPU File or packet properties Data Interfaces Conversion Complex SNN Model File Classification Neuron Fabric Supervised learning for file classification based on file properties File or packet properties distinguishing parameters for files/network traffic, can be converted to spikes in SW on CPU or by Akida NSoC Brainchip OCTOBER 2017 23

Cybersecurity Applications: Anomaly Detection CPU Behavior Properties Data Interfaces Conversion Complex SNN Model Behavior classifiers Neuron Fabric Supervised learning on known good behavior and anomalous behavior Behavior properties can be CPU loads for common applications, network packets, power consumption, fan speed, etc.. Brainchip OCTOBER 2017 24

Creating SNNs: The Akida Development Environment Brainchip OCTOBER 2017 25

AKIDA Training Methods Unsupervised learning from unlabeled data Detection of unknown patterns in data On-chip or off-chip Unsupervised learning with label classification First layers learns unlabeled features, labeled in fully connected layer On-chip or off-chip Brainchip OCTOBER 2017 26

World s first NSoC Low power and footprint of neuromorphic computing Highest performance /w/$ Estimated tape-out 1H2019, samples 2H2019 Complete solution for embedded/edge applications but scalable for cloud/server usage Brainchip OCTOBER 2017 27