Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators

Size: px

Start display at page:

Download "Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators"

Julia Booth
6 years ago
Views:

1 Exploiting Hidden Layer Modular Redundancy for Fault-Tolerance in Neural Network Accelerators Schuyler Eldridge Ajay Joshi Department of Electrical and Computer Engineering, Boston University January 30, 2015 This work was supported by a NASA Office of the Chief Technologist s Space Technology Research Fellowship. schuye@bu.edu 30 Jan /12

2 Motivation Leveraging CMOS Scaling for Improved Performance is Becoming Increasingly Hard Contributing factors making it difficult include: Fixed power budgets An eventual slowdown of Moore s Law Computer engineers increasingly turn towards alternative designs Alternative Designs As an alternative, others are investigating general and special purpose accelerators One actively researched accelerator architecture is that of neural network accelerators schuye@bu.edu 30 Jan /12

3 Artificial Neural Networks Output Hidden Input H 1 H 2... H h bias I 1 X 1 Y 1 O 1... O o... I i bias X i Y o Figure: Two-layer neural network with i h o nodes. Artificial Neural Network Directed graph of neurons Edges between neurons are weighted Use in Applications Machine Learning Big Data Approximate Computing State Prediction schuye@bu.edu 30 Jan /12

4 Neural Networks and Fault-Tolerance The Brain is Fault-Tolerant! Ergo neural networks are fault-tolerant This isn t generally the case! Do Neural Networks have the potential for Fault-Tolerance? Neural networks have a redundant structure There are multiple paths from input to output Regression tasks often approximate smooth functions Small changes in inputs or internal computations may only cause small changes in the output However, there is no implicit guarantee of fault-tolerance unless you train a neural network to specifically demonstrate those properties schuye@bu.edu 30 Jan /12

5 N-MR Technique Y 1 Y 2 O 1 O 2 H 1 H 2 bias Steps for Amount of Redundancy N 1 Replicate each hidden neuron N times 2 Replicate each hidden neuron connection for each new neuron I 1 I 2 bias 3 Multiply all connection weights by 1 /N X 1 X 2 Figure: N-MR-1 schuye@bu.edu 30 Jan /12

6 N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-2 schuye@bu.edu 30 Jan /12

7 N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-3 schuye@bu.edu 30 Jan /12

8 N-MR Technique Y 1 Y 2 Y 1 Y 2 O 1 O 2 O 1 O 2 H 1 H 2 bias H 1 H 2 H 3 H 4 H 5 H 6 H 7 H 8 bias I 1 I 2 bias I 1 I 2 bias X 1 X 2 X 1 X 2 Figure: N-MR-1 Figure: N-MR-4 schuye@bu.edu 30 Jan /12

9 Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan /12

10 Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan /12

11 Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan /12

12 Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan /12

13 Neural Network Accelerator Architecture NN Config and Data Storage Unit Intermediate Storage Control Core Communication Figure: Block diagram of our neural network accelerator Basic Operation in a Multicore Environment Threads communicate neural network computation requests to this accelerator The accelerator allocates processing elements (s) to compute the outputs of all pending requests schuye@bu.edu 30 Jan /12

14 Evaluation Overview Table: Evaluated neural networks and their topologies Application NN Topology Description blackscholes (b) [1] Financial option pricing rsa (r) [2] Brute-force prime factorization sobel (s) [1] Sobel filter Methodology We vary the amount of N-MR for the applications in Table 1 running on our NN accelerator architecture We introduce a random fault into a neuron and measure the accuracy and latency R. St. Amant et al., General-purpose code acceleration with limited-precision analog computation, in ISCA, 2014, pp A. Waterland et al., Asc: Automatically scalable computation, in ASPLOS. ACM, 2014, pp schuye@bu.edu 30 Jan /12

15 Evaluation Normalized Latency Normalized Latency Amount of N-MR blackscholes sobel rsa Linear Baseline Figure: Latency normalized to N-MR-1 Latency Scaling with N-MR Work, where work is the number of edges to compute, scale with N-MR However, latency scales sublinearly for our accelerator Increasing N-MR means more work, but also more efficient use of the accelerator schuye@bu.edu 30 Jan /12

16 Evaluation Accuracy Percentage Error Increase Amount of N-MR Normalized Accuracy Amount of N-MR blackscholes (MSE) rsa (% correct) sobel (MSE) Figure: Left: percentage accuracy difference, Right: accuracy normalized to N-MR-1 Accuracy and N-MR Generally, accuracy improves with increasing N-MR schuye@bu.edu 30 Jan /12

17 Evaluation Combined Metrics Normalized EDP Amount of N-MR Cost of N-MR We evaluate the cost using Energy-Delay product (EDP) A high cost as N-MR increases both energy and delay blackscholes rsa sobel Figure: Energy-Delay Product (EDP) for varying N-MR schuye@bu.edu 30 Jan /12

18 Discussion and Conclusion An Initial Approach As neural network accelerators become mainstream, approaches to improve their fault-tolerance will have increased value N-MR is a preliminary step to leverage the potential for fault-tolerance in neural networks Other approaches do exist: Training with faults Splitting important neurons and pruning unimportant ones Future Directions Varying N-MR at run-time Faults are currently assumed to be intermittent, but by varying internal structure and enforcing scheduling neurons on different s, a more robust approach can be developed Run-time splitting of important nodes or not computing unimportant nodes schuye@bu.edu 30 Jan /12

19 Summary and Questions Figure: Latency, accuracy, and combined metrics Y 1 Y 2 O 1 O 2 H 1 H 2 bias I 1 X 1 I 2 X 2 bias Figure: A two-layer NN Intermediate Storage NN Config and Data Storage Unit Control Core Communication Figure: NN accelerator architecture schuye@bu.edu 30 Jan /12

Neural Network based Energy-Efficient Fault Tolerant Architect

Neural Network based Energy-Efficient Fault Tolerant Architectures and Accelerators University of Rochester February 7, 2013 References Flexible Error Protection for Energy Efficient Reliable Architectures