Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Size: px

Start display at page:

Download "Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,"

Charleen Simpson
5 years ago
Views:

1 Index A Algorithmic noise tolerance (ANT), Application specific instruction set processors (ASIPs), Approximate computing application level, 95 circuits-levels, DAS and DVAS, DVAFS, , 111 dynamic and tunable, 91 fault-tolerant operation, 90 hardware, low-precision JPEG compression, overview of, paradigm leverages, processor-architecture, 94 programming languages and compilers, 95 quality management, resilience identification, 92 SOC-architecture level, software level, 95 SotA, 111 voltage scaling, 97 B Baseline BinaryNet architecture architecture overview, FC/dense layer, input decoding, instructions and control overview, neuron array, system-level controller, Batch normalization, Benchmark network breakdown, CONV layers, 181 functional breakdown, measurement results, relative computation, 182 BinarEye benchmark network, die photo, digital binary neuron, 180 DVAFS systems, I2L system-level, neuron split (sub-neurons), 172, 179 physical implementation, 180 vs. state-of-the-art, SX implementation, system-level performance, BinaryNets applications, ASIC, , 192 batch normalization, BinarEye (see BinarEye) vs. BinaryEye, MSBNN and SX, binary neural networks, 153 computer architecture, Input-to-Label accelerator, layers, mixed-signal compute, 175 object detection, processor architecture, 158 scalable processing, 158 Bit-write-enable (BWE), 165 Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning, 201

2 202 Index C Caffe, 23 Cascaded systems, see Hierarchical cascaded systems Clustered voltage scaling (CVS), 101 Clustering binary energy, building blocks and concepts, linearly quantized models, linear quantization techniques, 83 CNN, see Convolutional neural networks (CNN) Convolutional layer (CONVL) formal mathematical description, parameters, 8 9 Convolutional neural networks (CNN) AlexNet, 10 building blocks, convolutional layer, 8 9 DenseNet block, efficient neural networks, FC vs. CONV layers, 9 10 fully connected layers, 9 hierarchical system (see Face recognition system-cnn) high-dimensional input data, 7 8 inception networks, mathematical description, 9 max-pooling layers, 9 Megaflops, MobileNets, multi-layers, 8 network architectures, nonlinearity layer, 9 operations efficiency, 12, 15 parameter sharing, 8 residual neural networks, SotA Xception architecture, sparse connectivity, 8, 122 weight efficiency, Xception, D DAS, see Dynamic-accuracy-scaling (DAS) Data augmentation, 22 Deep feed-forward neural networks graphical representation, 6 hidden layer, 6 input and output layer, 6 universal approximation theorem, 7 vector-to-scalar function, 7 Deep learning concept of, 5 convolutional neural networks, 7 15 deep feed-forward networks, 6 7 developing field, 4 GoogleNet, 5 6 machine learning, 4 5 multi-layered representational approach, 4 neural networks, recurrent neural networks, representation, 5 visualizations, 5 Demultiplexer (XMUX), Direct memory access (DMA) controller, 122 DVAFS, see Dynamic-voltage-accuracyfrequency-scaling (DVAFS) DVAFS-compatible envision processor benchmarks, chip photograph and layout, computer architecture, 148 conceptual overview, 136 envision V2 overview, 148 low precision processing, 143 measurements, 139 optimal body-biasing, performances, physical implementation, processing architecture, processor architecture, 136 relative energy consumption, RTL level hardware, vs. SotA, sparse processing, 144 DVAS, see Dynamic-voltage-accuracy-scaling (DVAS) DVAS-compatible envision processor benchmarks, chip photograph and layout, computer architecture, 135 dynamic precision, full precision baseline, performance comparison, relevant benchmarks, physical implementation, RTL level hardware, source code, 126 sparse processing, 132 vs. state of the art, V1 overview, 135 Dynamic-accuracy-scaling (DAS) compatible building blocks, 108 energy-accuracy trade-off, 100

3 Index 203 multipliers, 100 precision scaling, switching activity, 97 Dynamic-voltage-accuracy-frequency-scaling (DVAFS) accuracy-scalable, 99 accuracy scaling, 99, 104 building blocks, CIFAR-10 accuracy, 185 circuit level, 96 compatible building blocks, compatible envision system, depth-scaling, 184 digital building blocks, 99 DVAS and DAS, energy-accuracy trade-off, energy consumption, energy savings, energy vs. accuracy, 104 enforcing critical path scaling, 110 frequency scaling, 98 functional implementation, granular supply scaling, 110 multipliers, 109 non accuracy-scalable, 99 performance of, positive circuit delay slack, power distribution and consumption, 106 precision scaling, projected energy consumption, 107 resilience identification, rounding, 109 SIMD processor, sub-net parallelism, subword-parallel processing, 102 system level, truncating, 109 voltage scaling, 101 Dynamic-voltage-accuracy-scaling (DVAS) compatible building blocks, 108 compatible envision processor, energy-accuracy trade-off, precision scaling, switching activity, 98 voltage scaling, 101 E Electronic design automation (EDA) tools, 110 Embedded deep learning analog vs. digital compute, 198 application level, 196 BinarEye, binary neural networks, computing platform, disadvantages, 195 DVAFS, envision, 117, 197 future reference, 199 hardware software co-optimizations, 196 hierarchical cascaded systems, 33 MSBNN, 197 Embedded deep neural networks accuracy vs. model size, 10 accuracy vs. operations, 3 artificial intelligence, 1 challenges of, cloud computing, 24 deep learning, 4 23 edge processing platforms, 24 embedded neural networks, 1 2 levels of, machine learning, 2 4 real-time neural networks, state-of-the-art (SotA) techniques, 1 Energy-delay-product (EDP), 135 Energy model digital circuit, 63 generic hardware platform, 63 hardware platform, off-chip memory system, 64 root-mean-square-error (RMSE), 63 Energy vs. accuracy, 71 Envision ASICs, convolutional neural networks, 122 DVAFS-compatible system, DVAS, hardware platforms, 116 key characteristics, 117 neural network acceleration, parallel processing, 121 2D-MAC processor architecture, F Face recognition system-cnn cost, precision and recall, face detection, high resolution images, N-stage system, optimization problem, parametrized network topology, POR vs. recall, recall and precision vs. efficiency, wake-up approach, 48 window approach, 47 Fault-tolerance (deep neural networks), 60 63

4 204 Index Feed-forward neural network, see Deep feed-forward neural networks Finite-state-machine (FSM), 57 Fully connected layer (FCL) activations and weights, convolutional layer, 8 9 deep feed-forward neural networks, 7 dense layer, neuron operation, 155 H Hardware-algorithm co-optimization algorithmic and processor architecture, 56 clustering, energy model, energy-modelling, 63 65, fault-tolerance, network structure, neural network architecture, 61 neural network characteristics, sparse neural networks, sparsity, test-time quantized neural network, train-time quantized neural networks, Hierarchical cascaded systems classification system, CNN-face recognition (see Face recognition system-cnn) cost, precision and recall, detection/classification systems, 33 hierarchical stages, 34 input data statistics, key contributions, 34 35, 53 mobile device, number of stages, optimal stage metrics, 46 optimization parameters, 41 processing concept, 34, relative target recall, roofline model and real classifiers, wake-up-based systems, 35 36, 46 I Inexact arithmetic circuits, Input-to-Label (I2L) accelerator architecture baseline network architecture, chip architecture, CONV and MaxPool layers, 160 SX architecture, X architecture, Instruction set architecture (ISA), 94 L Least-significant-bits (LSB), Long short-term memory (LSTM), M Machine learning classification applications, 3 experience E, 3 4 performance measure P,3 task T,3 Max-pooling layers, Metal-oxide-metal (MOM), 173 Mixed-signal binary neural network (MSBNN) analog processing, classification accuracy, die photo of, high-performance low-leakage, 175 measurement results, SC neuron, vs. state of the art, 176, 178 Momentum vs. plain SGD, Most-significant-bits (MSB), N Natural language processing (NLP), 20 Network structure dataflows, data movements, DRAM interface, 59 finite-state-machine, 57 in-memory computing, 59 input stationary approach, output stationary, 57 parallelism, 57 systolic processing concept, weight stationary, 57 well-designed memory, Neural network training backpropagation, 18 data sets, loss function, momentum, optimization, regularization, SGD algorithm, 18 supervised learning algorithms, 17 train and validation subset, training frameworks, Neuron array CONV cycles, hardwired binary neuron, load (LD) sequences, 163

5 Index 205 max-pooling, POPCOUNT operator, 162 processing sequence, SRAM, weight updates, X architecture, 161, 163 Nonlinearity layer, 155 O Overfitting network, P Parameter norm penalties, 22 Pass-on-class (poc), 35 36, 38 40, 50 Per-layer quantization, Precision convolutional neural network model, hierarchical cascaded systems, optimization, 41 recall and efficiency, scaling techniques, 94 Probability density functions (PDF), Pytorch, 23 Q Quantized neural networks (QNNs), see Train-time quantized neural networks R Recall accuracy, 36 convolutional neural network, vs. efficiency, hierarchical cascaded systems, optimization problem, 41 targets and input statistics, Receiver operating characteristic (ROC) curves, 40 Rectified Linear Unit (ReLU), 155 Recurrent neural networks (RNN), Regularization techniques, Roofline model curves, pass-on-rate and recall, real classifiers, 39 ROC curves, 40 S State of the Art (SotA) vs. DVAS, envision V2, hierarchical cascaded systems, 33 Stochastic gradient descent (SGD) algorithm, 18 Supervised learning algorithms, 4 Switched-capacitor (SC) neuron analog compute, 27 capacitive DAC (CDAC), charge redistribution, CNN filter computation, 174 SX architecture different operating modes, LD-CONV cycle, 171 neuron split (sub-neurons), operating modes, RGB input images, system-level view, T Tensorflow, 23 Test-time quantized neural network accurate classification, benchmarks, 72 contributions, energy-accuracy space, experiments, 66 floating-point numbers, 65 performance- and energy-related effects, 66 precision scaling, 70 relative energy consumption, 71 ReLU layers, 70 sparse FPNNs, sparse neural networks, Theano, 23 Training, see Neural network training Train-time quantized neural networks activations, benchmark data sets, BinaryNets, 81 energy savings, 77 error rate vs. energy, 82 evaluation, 74 FPNN vs. QNN, 73 generalization, 73 methodology, 83 MNIST, SVHN, and CIFAR-10, model sizes and inference complexity, 79 80

6 206 Index Train-time quantized neural networks (cont.) multiple network topologies, on-chip memory chip-models, 81 pareto-optimal topologies, QNN input layers, 76 recognition accuracy, 80 ResNet network architecture, training, weights, D-MAC array CNN dataflow, compute units, 121 FIFO-input data, , 117, 118 2D-MAC processor architecture assembly code, 125 direct memory access controller, 122 energy-efficient, hardware extensions, high-level overview, on-chip data memory topology, 121 processor datapath, X architecture, see Baseline BinaryNet architecture U Underfitting network, Uniform quantization, 67 68, 71 Unsupervised learning algorithms, 4 V Voltage over-scaling (VOS) techniques, W Wake-up-based systems, 35 36, 46, 48

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all