Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators

Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators Schuyler Eldridge Ajay Joshi Department of Electrical and Computer Engineering, Boston University schuye@bu.edu January 30, 2015 This work was supported by a NASA Office of the Chief Technologist s Space Technology Research Fellowship. schuye@bu.edu 30 Jan 2015 1/12

Motivation Leveraging CMOS Scaling for Improved Performance is Becoming Increasingly Hard Contributing factors making it difficult include: Fixed power budgets An eventual slowdown of Moore s Law Computer engineers increasingly turn towards alternative designs Alternative Designs As an alternative, others are investigating general and special purpose accelerators One actively researched accelerator architecture is that of neural network accelerators schuye@bu.edu 30 Jan 2015 2/12

Artificial Neural Networks Output Hidden Input H 1 H 2... H h bias I 1 X 1 Y 1 O 1... O o... I i bias X i Y o Figure: Two-layer neural network with i h o nodes. Artificial Neural Network Directed graph of neurons Edges between neurons are weighted Use in Applications Machine Learning Big Data Approximate Computing State Prediction schuye@bu.edu 30 Jan 2015 3/12

Neural Networks and Fault-Tolerance The Brain is Fault-Tolerant! Ergo neural networks are fault-tolerant This isn t generally the case! Do Neural Networks have the potential for Fault-Tolerance? Neural networks have a redundant structure There are multiple paths from input to output Regression tasks often approximate smooth functions Small changes in inputs or internal computations may only cause small changes in the output However, there is no implicit guarantee of fault-tolerance unless you train a neural network to specifically demonstrate those properties schuye@bu.edu 30 Jan 2015 4/12

N-MR Technique Y 1 Y 2 O 1 O 2 H 1 H 2 bias Steps for Amount of Redundancy N 1 Replicate each hidden neuron N times 2 Replicate each hidden neuron connection for each new neuron I 1 I 2 bias 3 Multiply all connection weights by 1 /N X 1 X 2 Figure: N-MR-1 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-2 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-3 schuye@bu.edu 30 Jan 2015 5/12

N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 H 7 H 8 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-4 schuye@bu.edu 30 Jan 2015 5/12

Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan 2015 6/12

Evaluation Overview Table: Evaluated neural networks and their topologies Application NN Topology Description blackscholes (b) [1] 6 8 8 1 Financial option pricing rsa (r) [2] 30 30 30 Brute-force prime factorization sobel (s) [1] 9 8 1 3 3 Sobel filter Methodology We vary the amount of N-MR for the applications in Table 1 running on our NN accelerator architecture We introduce a random fault into a neuron and measure the accuracy and latency R. St. Amant et al., General-purpose code acceleration with limited-precision analog computation, in ISCA, 2014, pp. 505 516. A. Waterland et al., Asc: Automatically scalable computation, in ASPLOS. ACM, 2014, pp. 575 590. schuye@bu.edu 30 Jan 2015 7/12

Evaluation Normalized Latency Normalized Latency 6 4 2 1 3 5 7 Amount of N-MR blackscholes sobel rsa Linear Baseline Figure: Latency normalized to N-MR-1 Latency Scaling with N-MR Work, where work is the number of edges to compute, scale with N-MR However, latency scales sublinearly for our accelerator Increasing N-MR means more work, but also more efficient use of the accelerator schuye@bu.edu 30 Jan 2015 8/12

Evaluation Accuracy Percentage Error Increase 10 4 10 3 10 2 10 1 10 0 1 3 5 7 Amount of N-MR Normalized Accuracy 10 1 10 0 1 3 5 7 Amount of N-MR blackscholes (MSE) rsa (% correct) sobel (MSE) Figure: Left: percentage accuracy difference, Right: accuracy normalized to N-MR-1 Accuracy and N-MR Generally, accuracy improves with increasing N-MR schuye@bu.edu 30 Jan 2015 9/12

Evaluation Combined Metrics Normalized EDP 10 1 10 0 1 3 5 7 Amount of N-MR Cost of N-MR We evaluate the cost using Energy-Delay product (EDP) A high cost as N-MR increases both energy and delay blackscholes rsa sobel Figure: Energy-Delay Product (EDP) for varying N-MR schuye@bu.edu 30 Jan 2015 10/12

Discussion and Conclusion An Initial Approach As neural network accelerators become mainstream, approaches to improve their fault-tolerance will have increased value N-MR is a preliminary step to leverage the potential for fault-tolerance in neural networks Other approaches do exist: Training with faults Splitting important neurons and pruning unimportant ones Future Directions Varying N-MR at run-time Faults are currently assumed to be intermittent, but by varying internal structure and enforcing scheduling neurons on different s, a more robust approach can be developed Run-time splitting of important nodes or not computing unimportant nodes schuye@bu.edu 30 Jan 2015 11/12

Summary and Questions Figure: Latency, accuracy, and combined metrics Y 1 Y 2 O 1 O 2 H 1 H 2 bias I 1 X 1 I 2 X 2 bias Figure: A two-layer NN Intermediate Storage NN Config and Data Storage Unit Control Core Communication Figure: NN accelerator architecture schuye@bu.edu 30 Jan 2015 12/12