Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators

Similar documents
Neural Network based Energy-Efficient Fault Tolerant Architect

Neural Networks as Function Primitives

DNN ENGINE: A 16nm Sub-uJ DNN Inference Accelerator for the Embedded Masses

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Bridging Analog Neuromorphic and Digital von Neumann Computing

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

CHAPTER 7 MASS LOSS PREDICTION USING ARTIFICIAL NEURAL NETWORK (ANN)

A Matlab Tool for Analyzing and Improving Fault Tolerance of Artificial Neural Networks

Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance

Hardware Neuronale Netzwerke - Lernen durch künstliche Evolution (?)

Knowledge-Defined Networking: Towards Self-Driving Networks

Towards General-Purpose Neural Network Computing

Use of Artificial Neural Networks to Investigate the Surface Roughness in CNC Milling Machine

A Software Architecture for Progressive Scanning of On-line Communities

Neural Network-Based Accelerators for Transcendental Function Approximation

Model learning for robot control: a survey

PARALLEL TRAINING OF NEURAL NETWORKS FOR SPEECH RECOGNITION

Clustering algorithms and autoencoders for anomaly detection

Fast or furious? - User analysis of SF Express Inc

Analyzing Performance Asymmetric Multicore Processors for Latency Sensitive Datacenter Applications

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

NIGHTs-WATCH. A Cache-Based Side-Channel Intrusion Detector using Hardware Performance Counters

Using FPGAs as Microservices

The Design and Implementation of a Low-Latency On-Chip Network

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

DEEP LEARNING ACCELERATOR UNIT WITH HIGH EFFICIENCY ON FPGA

Adaptive Regularization. in Neural Network Filters

M.Tech Student, Department of ECE, S.V. College of Engineering, Tirupati, India

Lecture 17: Neural Networks and Deep Learning. Instructor: Saravanan Thirumuruganathan

Accelerated tokamak transport simulations

ECE 588/688 Advanced Computer Architecture II

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

ARCHITECTURE DESIGN FOR SOFT ERRORS

Algorithm-Data Driven Optimization of Adaptive Communication Networks

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Image Compression: An Artificial Neural Network Approach

SNIWD: Simultaneous Weight Noise Injection With Weight Decay for MLP Training

Value-driven Synthesis for Neural Network ASICs

CS 61C: Great Ideas in Computer Architecture. MapReduce

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

Parallelization and Synchronization. CS165 Section 8

Energy-Efficiency Prediction of Multithreaded Workloads on Heterogeneous Composite Cores Architectures using Machine Learning Techniques

Energy Consumption in Mobile Phones: A Measurement Study and Implications for Network Applications (IMC09)

CSC 578 Neural Networks and Deep Learning

Application-Aware SDN Routing for Big-Data Processing

Core. Error Predictor. Figure 1: Architectural overview of our quality control approach. Approximate Accelerator. Precise.

5 Learning hypothesis classes (16 points)

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Identification of Multisensor Conversion Characteristic Using Neural Networks

Simplifying the Datacenter with Hyper-Convergence. Bob O Donnell, Founder and Chief Analyst

Data Mining on Agriculture Data using Neural Networks

MapReduce. U of Toronto, 2014

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

Linear Regression Optimization

Seismic regionalization based on an artificial neural network

Mikko Ohvo Business Development Manager Nokia

ONOS OVERVIEW. Architecture, Abstractions & Application

FOR HIERARCHICAL SIGNAL FLOW GRAPHS

INTELLIGENT SEISMIC STRUCTURAL HEALTH MONITORING SYSTEM FOR THE SECOND PENANG BRIDGE OF MALAYSIA

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data

NIC FastICA Implementation

SESDAD. Desenvolvimento de Aplicações Distribuídas Project (IST/DAD): MEIC-A / MEIC-T / METI. October 1, 2015

CS6220: DATA MINING TECHNIQUES

A Scalable, Commodity Data Center Network Architecture

SIMPLIFY IT. Transform IT with VCE and Vblock TM Infrastructure Platforms. Copyright 2011 VCE Company LLC, All rights reserved.

Visual object classification by sparse convolutional neural networks

Neural Computer Architectures

FOR A WALL STREET INVESTMENT BANK JOSH WEST SOLUTIONS ARCHITECT RED HAT FINANCIAL SERVICES

Introduction to ANSYS DesignXplorer

High Performance Computing

Computer Architecture: Parallel Processing Basics. Prof. Onur Mutlu Carnegie Mellon University

CS229 Final Project: Predicting Expected Response Times

Do we need a crystal ball for task migration?

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

Data-Centric Architecture for Space Systems

Problem 1: Complexity of Update Rules for Logistic Regression

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models

EMC CLARiiON CX3 UltraScale Series The Proven Midrange Storage

BigData and Map Reduce VITMAC03

Design and Performance Analysis of and Gate using Synaptic Inputs for Neural Network Application

Reservoir Computing with Emphasis on Liquid State Machines

PERFORMANCE OF GRID COMPUTING FOR DISTRIBUTED NEURAL NETWORK. Submitted By:Mohnish Malviya & Suny Shekher Pankaj [CSE,7 TH SEM]

Programming Models for Supercomputing in the Era of Multicore

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

Key Features. High-performance data replication. Optimized for Oracle Cloud. High Performance Parallel Delivery for all targets

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Chapter 1: Fundamentals of Quantitative Design and Analysis

ANN-Based Modeling for Load and Main Steam Pressure Characteristics of a 600MW Supercritical Power Generating Unit

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Transcription:

Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators Schuyler Eldridge Ajay Joshi Department of Electrical and Computer Engineering, Boston University schuye@bu.edu January 30, 2015 This work was supported by a NASA Office of the Chief Technologist s Space Technology Research Fellowship. schuye@bu.edu 30 Jan 2015 1/12

Motivation Leveraging CMOS Scaling for Improved Performance is Becoming Increasingly Hard Contributing factors making it difficult include: Fixed power budgets An eventual slowdown of Moore s Law Computer engineers increasingly turn towards alternative designs Alternative Designs As an alternative, others are investigating general and special purpose accelerators One actively researched accelerator architecture is that of neural network accelerators schuye@bu.edu 30 Jan 2015 2/12

Artificial Neural Networks Output Hidden Input H 1 H 2... H h bias I 1 X 1 Y 1 O 1... O o... I i bias X i Y o Figure: Two-layer neural network with i h o nodes. Artificial Neural Network Directed graph of neurons Edges between neurons are weighted Use in Applications Machine Learning Big Data Approximate Computing State Prediction schuye@bu.edu 30 Jan 2015 3/12

Neural Networks and Fault-Tolerance The Brain is Fault-Tolerant! Ergo neural networks are fault-tolerant This isn t generally the case! Do Neural Networks have the potential for Fault-Tolerance? Neural networks have a redundant structure There are multiple paths from input to output Regression tasks often approximate smooth functions Small changes in inputs or internal computations may only cause small changes in the output However, there is no implicit guarantee of fault-tolerance unless you train a neural network to specifically demonstrate those properties schuye@bu.edu 30 Jan 2015 4/12

N-MR Technique Y 1 Y 2 O 1 O 2 H 1 H 2 bias Steps for Amount of Redundancy N 1 Replicate each hidden neuron N times 2 Replicate each hidden neuron connection for each new neuron I 1 I 2 bias 3 Multiply all connection weights by 1 /N X 1 X 2 Figure: N-MR-1 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-2 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-3 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 H 7 H 8 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-4 schuye@bu.edu 30 Jan 2015 5/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Evaluation Overview Table: Evaluated neural networks and their topologies Application NN Topology Description blackscholes (b) [1] 6 8 8 1 Financial option pricing rsa (r) [2] 30 30 30 Brute-force prime factorization sobel (s) [1] 9 8 1 3 3 Sobel filter Methodology We vary the amount of N-MR for the applications in Table 1 running on our NN accelerator architecture We introduce a random fault into a neuron and measure the accuracy and latency R. St. Amant et al., General-purpose code acceleration with limited-precision analog computation, in ISCA, 2014, pp. 505 516. A. Waterland et al., Asc: Automatically scalable computation, in ASPLOS. ACM, 2014, pp. 575 590. schuye@bu.edu 30 Jan 2015 7/12

Evaluation Normalized Latency Normalized Latency 6 4 2 1 3 5 7 Amount of N-MR blackscholes sobel rsa Linear Baseline Figure: Latency normalized to N-MR-1 Latency Scaling with N-MR Work, where work is the number of edges to compute, scale with N-MR However, latency scales sublinearly for our accelerator Increasing N-MR means more work, but also more efficient use of the accelerator schuye@bu.edu 30 Jan 2015 8/12

Evaluation Accuracy Percentage Error Increase 10 4 10 3 10 2 10 1 10 0 1 3 5 7 Amount of N-MR Normalized Accuracy 10 1 10 0 1 3 5 7 Amount of N-MR blackscholes (MSE) rsa (% correct) sobel (MSE) Figure: Left: percentage accuracy difference, Right: accuracy normalized to N-MR-1 Accuracy and N-MR Generally, accuracy improves with increasing N-MR schuye@bu.edu 30 Jan 2015 9/12

Evaluation Combined Metrics Normalized EDP 10 1 10 0 1 3 5 7 Amount of N-MR Cost of N-MR We evaluate the cost using Energy-Delay product (EDP) A high cost as N-MR increases both energy and delay blackscholes rsa sobel Figure: Energy-Delay Product (EDP) for varying N-MR schuye@bu.edu 30 Jan 2015 10/12

Discussion and Conclusion An Initial Approach As neural network accelerators become mainstream, approaches to improve their fault-tolerance will have increased value N-MR is a preliminary step to leverage the potential for fault-tolerance in neural networks Other approaches do exist: Training with faults Splitting important neurons and pruning unimportant ones Future Directions Varying N-MR at run-time Faults are currently assumed to be intermittent, but by varying internal structure and enforcing scheduling neurons on different s, a more robust approach can be developed Run-time splitting of important nodes or not computing unimportant nodes schuye@bu.edu 30 Jan 2015 11/12

Summary and Questions Figure: Latency, accuracy, and combined metrics Y 1 Y 2 O 1 O 2 H 1 H 2 bias I 1 X 1 I 2 X 2 bias Figure: A two-layer NN Intermediate Storage NN Config and Data Storage Unit Control Core Communication Figure: NN accelerator architecture schuye@bu.edu 30 Jan 2015 12/12