Low-Power Neural Processor for Embedded Human and Face detection

Size: px

Start display at page:

Download "Low-Power Neural Processor for Embedded Human and Face detection"

Benjamin Ball
6 years ago
Views:

Low-Power Neural Processor for Embedded Human and Face detection Olivier Brousse 1, Olivier Boisard 1, Michel Paindavoine 1,2, Jean-Marc Philippe, Alexandre Carbon (1) GlobalSensing

1 Low-Power Neural Processor for Embedded Human and Face detection Olivier Brousse 1, Olivier Boisard 1, Michel Paindavoine 1,2, Jean-Marc Philippe, Alexandre Carbon (1) GlobalSensing Technologies (GST) Dijon, France (2) LEAD Université de Bourgogne CNRS, Dijon, France (3) DACLE - CEA LIST Nano-innov, Palaiseau, France June 23th 2016 NeuroSTIC O. Brousse 1

Introduction An optimization of performance vs complexity consists in bio-inspired Human Vision performances in words of detection and recognition: Simple Tasks with Human Brain vs Von Neuman

2 Introduction An optimization of performance vs complexity consists in bio-inspired Human Vision performances in words of detection and recognition: Simple Tasks with Human Brain vs Von Neuman Computer (like PC): - Recognizes in less than one second this image: - But Calculates in less than one second ( x =?) Artificial vision model proposal for embedded systems: - Arithmetic calculations used in image filtering for example: -> Von Neuman (or Harvard) architectures - Object recognition from natural images: ->Neuro-inspired Human intelligence: Artificial Intel. on Silicon June 23th 2016 NeuroSTIC O. Brousse 2

3 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 3

Deep Neural Network Models ImageNet classification (Hinton s

Facebook s DeepFace Program (labs head: Y.

4 Deep Neural Network Models ImageNet classification (Hinton s team, hired by Google) 1.2 million high res images, 1,000 different classes Top-5 17% error rate (huge improvement) Learned features on first layer Facebook s DeepFace Program (labs head: Y. LeCun) 4 million images, 4,000 identities 97.25% accuracy, vs % human performance June 23th 2016 NeuroSTIC O. Brousse 4

State-of-the-art in Recognition Database # Images # Classes Best score MNIST

79% [3] GTSRB Traffic sign CIFAR-10 airplane, automobile, bird, cat, deer, dog,

46% [4] 50,000 + 10,000 State-of-the-art are Deep Neural Networks every time 10 91.

5 State-of-the-art in Recognition Database # Images # Classes Best score MNIST Handwritten digits 60, , % [3] GTSRB Traffic sign CIFAR-10 airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck ~ 50, % [4] 50, ,000 State-of-the-art are Deep Neural Networks every time % [5] Caltech-101 ~ 50, % [6] ImageNet ~ 1,000,000 1,000 Top-5 83% [1] DeepFace ~ 4,000,000 4, % [2] June 23th 2016 NeuroSTIC O. Brousse 5 INCREASING COMPLEXITY

6 CNNs Organization Deep = number of layers >> 1 June 23th 2016 NeuroSTIC O. Brousse 6

State-of-the-art CNN Example The German Traffic Sign Recognition Benchmark (GTSRB) 43 traffic sign types > 50,000 images Neurons: 287,843 Synapses: 1,388,800 Total memory: 1.

7 State-of-the-art CNN Example The German Traffic Sign Recognition Benchmark (GTSRB) 43 traffic sign types > 50,000 images Neurons: 287,843 Synapses: 1,388,800 Total memory: 1.5MB (with 8 bits synapses) Connections: 124,121,800 [3] D. Ciresan, U. Meier, J. Masci, J. Schmidhuber, Multi-column deep neural network for traffic sign classification, Neural Networks (32), pp , 2012 Near human recognition (> 98%) [3] June 23th 2016 BioInspried Low-Power - O. Brousse 7

8 An other Neuro-Inspired Model: Hmax (a NeuroScience Approach) Hmax Model: Serre et al, IEEE PAMI 2007 Poggio et al., J Neurophysiol, 2007 June 23th 2016 NeuroSTIC O. Brousse 8

9 Neuro-Inspired Models: The Hmax S1 layer using Gabor filters June 23th 2016 NeuroSTIC O. Brousse 9

10 Neuro-Inspired Models: The Hmax Original Image Gabor Filters June 23th 2016 NeuroSTIC O. Brousse 10

11 Original Image Gabor Filters BioInspried Low-Power - O. Brousse 11 June 23th 2016

12 Neuro-Inspired Models: The Hmax Hmax Model performances June 23th 2016 NeuroSTIC O. Brousse 12

13 Hmax accelerator: Complexity 64 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC C1: Max: 0.13 GOP RBF Neural Network : 0.4 GOP One IP camera 1M 30 fps: 103 GOP/sec Total: 3.43 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 13

14 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 14

15 Pneuro accelerator (Joint Laboratory CEA & GST initiated in 2013) Objective: Designing a processor integrating within the same chip signal processing functions and neuronal functions: Hmax, CNN Data In (Signals, Images) Cluster NeuroCores Cluster NeuroCores Cluster NeuroCores Classification Result From Previous NeuroDSP PNeuro: A Cascadable Parallel Architecture To Next NeuroDSP June 23th 2016 NeuroSTIC O. Brousse 15

16 PNeuro accelerator overview June 23th 2016 NeuroSTIC O. Brousse 16

17 PNeuro accelerator: Main Specifications - Programmable NeuroCores, each can perform image/signal processing and neural functions - Optimized for MAC and Neural operations - Signal processing: convolution filters, etc. - Neural functions: weighted inputs sum - Can perform non-linear operations (maximas, tangh, ) - 1 NeuroCore represents 1 neuron - NeuroCores can be time multiplexed for implementing bigger networks - Optimized memory accesses for data locality and reuse Variable number of clusters to accommodate different application domains and related performances June 23th 2016 NeuroSTIC O. Brousse 17

18 PNeuro accelerator: Performances Profiling result: based on FDSOI 28 nm technology One cluster of 4 1GHz: 32 GMAC/sec with 70mW power consumption, including memories and the controller 32 1GHz: 1024 GMAC/sec 2.2W Energy Efficiency: 465 GMAC.s -1 /W Full Hmax One IP camera 1M 30 fps: 103 GOP/sec Needs 4 clusters of 4 Neuro-Cores (sup[103/32]) 280mW June 23th 2016 NeuroSTIC O. Brousse 18

19 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 19

20 Face Detection Application Example (1/2) June 23th 2016 NeuroSTIC O. Brousse 20

21 Face Detection Application Example (2/2) Complexity Calculation divided by 8 (merge 8 scales) For one camera 1M 30 fps: 12.9 GOP.sec -1 (103 GOP.sec -1 /8) Needs One Cluster with 2 NeuroCores: Power consumption < 35mW For a VGA 30 fps only 1 NeuroCore: < 20 mw June 23th 2016 NeuroSTIC O. Brousse 21

22 Human detection: Hmin 64 Gabor Filters (7x7 to 37x37) + Original Image Local Maxima (C1) C1 Output Classification with RBF Neural Network (S2, C2) June 23th 2016 NeuroSTIC O. Brousse 22

23 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 23

24 Human Detection Application Example June 23th 2016 NeuroSTIC O. Brousse 24

Human Detection Application Example S1 Layer Gabor Filters C1 Layer Max Pooling RBF Classification Human Detected In order to reduce complexity, optimization from Masquelier et al (Plos Computational

25 Human Detection Application Example S1 Layer Gabor Filters C1 Layer Max Pooling RBF Classification Human Detected In order to reduce complexity, optimization from Masquelier et al (Plos Computational Biology 2007): 5 images scales (1, 0.7, 0.5, 0.35 and 0.25) 4 orientations One Gabor filter (15x15) per scale and per orientation 20 Gabor Filters 1 Mpixels Image complexity: S1: Optimized Gabor Filters: 2.9 GMAC 0.6 GMAC C1: Max: 0.1 GOP RBF Neural Network : 0.3 GOP Total: 3.43 GMAC & OP 1 GMAC & OP June 23th 2016 NeuroSTIC O. Brousse 25

26 Human Detection Application Example Complexity Calculation divided by 3.43: Original Hmax One IP camera 1M 30 fps: 103 GOP/sec Optimized Hmax One IP camera 1M 30 fps: 30 GOP/sec Using FDSOI28 technology, one cluster of 4 1GHz: 32 GMAC/sec with 70 mw power consumption Optimized Hmax needs 4 Neuro-Cores for one IP camera 1M 30fps: Power consumption 70mW For a VGA 30 fps only 2 Neuro-Cores : 35mW June 23th 2016 NeuroSTIC O. Brousse 26

27 Human Detection Application Examples June 23th 2016 NeuroSTIC O. Brousse 27

28 Outline Introduction Neuro-Inspired Vision Models Hardware Accelerator for Neuro-Inspired applications Application examples Conclusion June 23th 2016 NeuroSTIC O. Brousse 28

29 Conclusion PNeuro architecture optimized for NeuroInspired algorithms: Hmax, Convolutional Neural Network and more generally Deep Neural Networks PNeuro: A cascadable parallel architecture Performances in FDSOI 28nm allow to consider embedded applications with a very low power consumption: Face Detection needs only 20mW for a VGA image@30fps Human Detection needs only 35mW for a VGA image@30fps PNeuro implemented also on FPGA June 23th 2016 NeuroSTIC O. Brousse 29

PNeuro on FPGA First demonstration on a FPGA-based PNeuro Single cluster configuration (4 Neuro-Cores) Embedded CNN application (60 neurons on the hidden layer, 450 KOps) Faces extraction, 18000

30 PNeuro on FPGA First demonstration on a FPGA-based PNeuro Single cluster configuration (4 Neuro-Cores) Embedded CNN application (60 neurons on the hidden layer, 450 KOps) Faces extraction, images on the database, 96% recognition rate Same application ported on 5 different architectures Embedded CPU: Raspberry PI 2 B, Odroid Xu3 Embedded GPU: NVidia Tegra K1 (batch) Desktop CPU: Intel I7 PNeuro, Quad Neuro-Cores Using a in-house prototyping board Target Frequency Energy efficiency Intel I7 (CPU) 3400 MHz 160 images/w Quad ARM A15 (CPU) 2000 MHz 350 images/w Quad ARM A7 (CPU) 900 MHz 380 images/w Tegra K1 (GPU) 850 MHz 600 images/w PNeuro (FPGA) 100 MHz 2000 images/w FPGA approach is already competitive with existing CPU & GPU solutions First FPGA product developed for early 2017 by GST Embedded FPGA: Artix 100 (~1W), 17.6cm² for the board, including one cluster June 23th 2016 NeuroSTIC O. Brousse 30

31 Article in EETimes Embedded WORLD demonstration (feb 2016) June 23th 2016 NeuroSTIC O. Brousse 31

32 Merci! June 23th 2016 NeuroSTIC O. Brousse 32

GST, from Sensor to Decision

GST, from Sensor to Decision Videos Signals Mul2Media sound, images 2D, 3D,.. GST: Created in september 2011 12 persons (10 in R&D and 2 in Business) Technology based on neuro-inspired algorithms: E m