Deploying Deep Neural Networks in the Embedded Space

Size: px

Start display at page:

Download "Deploying Deep Neural Networks in the Embedded Space"

Erica McCormick
5 years ago
Views:

1 Deploying Deep Neural Networks in the Embedded Space Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis 2 nd International Workshop on Embedded and Mobile Deep Learning (EMDL) MobiSys, 15 June

2 Who we are Stylianos I. Venieris Machine Learning Alexandros Kouris Machine Learning, Robotics Konstantinos Boikos Computer Vision, SLAM Manolis Vasileiadis Computer Vision Mudhar Bin Rabieah Machine Learning Nur Ahmadi Brain-Machine Interface Christos-Savvas Bouganis Lab Director Reader at Imperial College London

3 DNNs in the Embedded Space Variability in Performance Requirements

4 Conventional Embedded Platforms for Neural Networks Challenge: Huge design space Our Approach: Automated toolflows

5 Research Areas / Challenges

6 Challenge #1: Mapping Automation

7 Challenge #1: Mapping Automation

8 Challenge #1: Automated CNN-to-FPGA Toolflow

9 fpgaconvnet Design Space Exploration and Optimisation Design 1 Design 2 Hardware Stages Interconnections

10 Meeting the performance requirements

11 Comparison with Embedded GPUs: Same absolute power constraints (5W) DenseNet ResNet-152 GoogLeNet VGG16 AlexNet DenseNet ResNet-152 GoogLeNet VGG16 AlexNet fpgaconvnet - FPGA ZC MHz (GOp/s) TensorRT - GPU TX MHz (GOp/s) fpgaconvnet - FPGA ZC MHz (GOp/s) TensorRT - GPU TX MHz (GOp/s)

12 Challenge #2: Multi-CNN Systems

13 Challenge #2: Multi-CNN Systems Autonomous Drones

14 Challenge #2: Multi-CNN System

15 Multi-CNN FPGA design Conv Pool Layer Layer CNN Engine1 Conv Layer CNN Engine 2 Pool Layer Conv Layer Pool Layer Conv Layer C-PE Weights Mem. C-PE Dot-product Unit Folding Conv Layer Pool Layer Conv Layer Pool Layer Weights Mem. Weights Mem. CNN Engine N Multi-CNN Hardware Scheduler FPGA C-PE Weights Mem. PE Folding Off-chip Memory

16 Comparison with Embedded GPUs: Same absolute power constraints (5W) Latency-driven scenario batch size of 1 Up to 9.68 speedup with an average of 5.25 (geo. mean) Latency-driven scenario batch size of 1 Up to speedup with an average of 6.85 (geo. mean)

17 Challenge #3: Time-constrained Inference

18 Challenge #3: Time-constrained Inference CNN LSTM

19 Challenge #3: Time-constrained Inference

20 Impact on LSTM-based Image Captioning

21 Impact on LSTM-based Image Captioning

22 Impact on LSTM-based Image Captioning

23 Challenge #4: Privacy-aware Deep Learning

24 Challenge #4: Privacy-restricted Optimisation

25 Challenge #4: Privacy-aware Deep Learning

26 MACCs-per-PE Cascade C N N : High-Level System Architecture x x x x x x + + x + + x + x CEU PASS Memory FAIL

27 Challenge #4: Privacy-aware Deep Learning Layers, Weights, GND LUTs, DSPs Memory BW, On-chip,

28 Challenge #4: Privacy-aware Deep Learning

29 Summary

30 Publications Alexandros Kouris, Stylianos I. Venieris, and Christos-Savvas Bouganis CascadeCNN: Pushing the performance limits of quantisation. In SysML. Alexandros Kouris, Stylianos I. Venieris, and Christos-Savvas Bouganis CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks. In th International Conference on Field Programmable Logic and Applications (FPL). C. Kyrkou, G. Plastiras, T. Theocharides, S. I. Venieris, and C. S. Bouganis DroNet: Efficient Convolutional Neural Network Detector for Real- Time UAV Applications. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE) Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, and Christos-Savvas Bouganis Approximate FPGA-based LSTMs under Computation Time Constraints. In Applied Reconfigurable Computing - 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Stylianos I. Venieris and Christos-Savvas Bouganis fpgaconvnet: A Framework for Mapping Convolutional Neural Networks on FPGAs. In 2016 IEEE 24 th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Stylianos I. Venieris and Christos-Savvas Bouganis fpgaconvnet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs. In NIPS 2017 Workshop on Machine Learning on the Phone and other Consumer Devices. Stylianos I. Venieris and Christos-Savvas Bouganis fpgaconvnet: Automated Mapping of Convolutional Neural Networks on FPGAs (Abstract Only). In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, S. I. Venieris and C. S. Bouganis Latency-Driven Design for FPGA-based Convolutional Neural Networks. In th International Conference on Field Programmable Logic and Applications (FPL). S. I. Venieris and C. S. Bouganis f-cnnx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs. In th International Conference on Field Programmable Logic and Applications (FPL). Stylianos I. Venieris, Alexandros Kouris, and Christos-Savvas Bouganis Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. In ACM Computing Surveys 51, 3, Article 56 (June 2018), 39 pages.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 fpgaconvnet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs Stylianos I. Venieris, Student Member, IEEE, and Christos-Savvas