Exploration of dynamic communication networks for neuromorphic computing

Size: px

Start display at page:

Download "Exploration of dynamic communication networks for neuromorphic computing"

Shanon Franklin
6 years ago
Views:

1 Eindhoven University of Technology MASTER Exploration of dynamic communication networks for neuromorphic computing Huynh, P.K. Award date: 2016 Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 03. Dec. 2017

Department of Electrical Engineering Electronic Systems Group Exploration of dynamic communication networks for Neuromorphic Computing Master Thesis Khanh Huynh Company

2 Department of Electrical Engineering Electronic Systems Group Exploration of dynamic communication networks for Neuromorphic Computing Master Thesis Khanh Huynh Company supervisors: University supervisor: Committee member: Dr. Anup Kumar Das Prof. Dr. Francky Catthoor Dr. Ir. Bart Mesman Dr. Mykola Pechenizkiy Version 1.1 Eindhoven, 24 th August 2016

4 Abstract Brain-inspired Neural Networks for Machine Learning have been in the spotlight for scientific research and technology enthusiasts in the recent decade. It is no doubt that these algorithms perform well in different types of classification tasks that humans are good at, such as image processing and speech recognition. However, they are computationally heavy and thus need to be run on power hungry machines. Moreover, the current approaches are not scalable up to hundreds of millions or billions of neurons. As a result, research has been conducted in the field of Neuromorphic Computing to provide more power efficient platforms that potentially can be integrated and utilized in mobile devices to run such algorithms. This research aims at developing a scalable, highly dynamic and low power communication network for neuromorphic clusters that tries to mimic the connection flexibility in the human brain. In the first part of the project, a hardware network simulator is developed to simulate spiking traffic between neuron clusters at global level. Afterwards, this simulator is used to test different interconnect models. From these basic experiments, it is observed that placing neuron clusters in different positions on the network affects the latency and energy consumption of the system. We then formalize this problem of mapping neuron clusters into network nodes to minimize communication cost and propose different algorithms to solve it. The research results show that mapping neural network communication on the right technology can help reduce power consumption and improve scalability. Exploration of dynamic communication networks for Neuromorphic Computing iii

5 Acknowledgement This thesis is the result of a nine month graduation project performed in IMEC/Holst Centre. IMEC has provided a truly unique opportunity to get acquainted with state-of-the-art research in a top-of-the-field industrial environment. During my time here, I have the chance to learn and grow both professionally and personally. I would like to thank Anup Kumar Das, my company supervisor, for his guidance and continuous support. His cheerful attitude in daily life situations and his focus on work have inspired me. Not only has he become my favourite supervisor, but also a good friend. Furthermore, I would like to convey my gratitude to Bart Mesman and Francky Catthoor, my supervisors from TU/e and IMEC Leuven, whose expertise and insights were invaluable throughout this project. Moreover, I am grateful to have made many friends during my time in IMEC. Our student lunch group has shared a lot of enjoyable moments together both during and after work. Last, but certainly not least, I would like to thank my family and all my friends, who are always there for me during my ups and downs. I am especially indebted to my mom, without whom there would be no such achievement today. I know that wherever she is, she would be looking out for me. iv Exploration of dynamic communication networks for Neuromorphic Computing

6 Contents Contents List of Figures List of Tables v vii viii 1 Introduction Project background Project description Project approach Thesis organization Neuromorphic Computing Background Neuron models Binary neuron model Continuous value neuron model Spiking neuron model Neural network Feed-forward neural network Convolution neural network Recurrent neural network Deep Belief Network Learning Learning with continuous value and binary neuron Learning with spiking neurons Neuromorphic Computing Basics of Neuromorphic Computing Converting continuous value neural network to Spiking neural network 11 3 Scalable Neuromorphic Interconnects Communication in neuromorphic computing Mesh network Segmented bus Two-stage NoC Exploration of dynamic communication networks for Neuromorphic Computing v

7 CONTENTS 4 Network Simulator Discrete-Event and Cycle-Accurate simulators Simulator choices Simulator software models Class diagram Use case diagram Intermediate simulation results mesh experiment mesh experiment Discussion Neuron Cluster Mapping Problem Problem formulation Proof of NP-hardness Solution algorithms Exact solution Heuristics Solution comparison Discussion Conclusion and Future Work Conclusion Future work Bibliography 40 vi Exploration of dynamic communication networks for Neuromorphic Computing

8 List of Figures 1.1 Project Overview Artificial neuron model Different activation functions An example of feed-forward neural network An example of Convolution Neural Network Training a Deep Belief Network An example of Forward pass (L) and Backpropagation (R) algorithm Spike Timing Dependent Plasticity A fully digital integrate and fire neuron model A post-synaptic learning circuit A 4x4 mesh network A typical mesh router An example of segmented bus network Full crossbar segmented bus Two-stage routing for spiking neural network System modeling graph Simulator class diagram Simulator use case diagram Latency and dynamic energy consumption with different mappings in 3 3 mesh network Latency and dynamic energy consumption with different routing strategies in 3 3 mesh network Latency and dynamic energy consumption with different mappings in 4 4 mesh network Latency and dynamic energy consumption with different routing strategies in 4 4 mesh network An example of reducing TSP to cluster mapping problem Comparison of different mapping solutions in 3 3 mesh network Comparison of different mapping solutions in 4 4 mesh network Comparison of different mapping solutions in 7 6 mesh network Comparison of different mapping solutions in 8 8 mesh network Average latency and cost function for different mappings in 4 4 mesh network Dynamic energy and cost function for different mappings in 4 4 mesh network 36 Exploration of dynamic communication networks for Neuromorphic Computing vii

9 List of Tables 4.1 Simulator preliminary comparison Simulator preliminary comparison Simulator detailed comparison Standard deviation in different experiments viii Exploration of dynamic communication networks for Neuromorphic Computing

10 List of Abbreviations AER BEOL BFS CNN DPI ILP INI MIPS NoC NP QAP RBM RNN RTL SNN STDP TFT TLM TSP UML VLSI WTA Address-Event Representation Back End of Line Breadth First Search Convolution Neural Network Differential Pair Integrator Integer Linear Programming Insitute of Neuroinformatics Microprocessor without Interlocked Pipeline Stages Network-on-Chip Nondeterministic Polynomial Time Quadratic Assignment Problem Restricted Boltzmann Machine Recurrent Neural Network Register Transfer Level Spiking Neural Network Spike Timing Dependent Plasticity Thin Film Transistor Transaction Level Modelling Travelling Salesman Problem Unified Modelling Language Vergy Large Scale Integration Winner-Take-All Exploration of dynamic communication networks for Neuromorphic Computing ix

11 Chapter 1 Introduction 1.1 Project background Von Neumann architecture has been the main power source for computation and data processing for the past 50 years. However, with the large amount of unstructured data being generated that requires analysis and classification, it is not possible to just scale up the Von Neumann architecture to meet this requirement due to its memory and processing unit communication bottleneck. Many research consortia have been active in the last few years to address this challenge. Likewise, IMEC, in cooperation with several partners, has also been conducting research on the development of a neuromorphic computing platform. The overall strategy is to jointly develop: Neural algorithms and computational architectures (to be run in) Neuromorphic information processing systems (which will make use of) Local synapse arrays (and which will all be integrated in) 3D VLSI technology (and which will support large scale multi-chip systems via) advanced thin film transistor (TFT) based interconnect using back end of line (BEOL) technology for scaling up to large amount of neuron clusters (in which this research is a part of) 1.2 Project description Neuromorphic computing devices circumvent the bottleneck of Von Neumann architecture by having the processing and memory elements (neurons and synapses, respectively) located very closely to each other. However, this new architecture also faces a strong requirement to provide communication infrastructure for a large number of synaptic fan-outs associated with a neuron. In hardware implementation, neurons are usually divided into clusters, which also divides the synaptic connections between neurons into two levels. The first level is local synaptic connection inside a neuron cluster, where all neurons are fully connected to each other. The second level is connection between neuron clusters, which is called a global synapse. Our project focuses on the second type of connection, the global synapse, with the main purpose of exploring different interconnect options that can be implemented on silicon. Exploration of dynamic communication networks for Neuromorphic Computing 1

12 CHAPTER 1. INTRODUCTION In summary, the project aims at developing a scalable, highly dynamic and flexible communication network for neuromorphic clusters that requires low operational power comparable to the human brain. 1.3 Project approach An overview of the activities required to achieve the aforementioned project goal is shown in figure 1.1. Interconnect technologies & Architectures Study Neuromorphic Computing background Basic Neural Network application simulation Literature study of network simulators TFT Simulation Spike communication in Neural Network Build Interconnect Hardware Simulator Simulate and Optimize Interconnect Models Figure 1.1: Project Overview The project begins with an investigation to get an overview of the state-of-the-art practices in neuromorphic computing and neural networks. In the next step, more literature study regarding different interconnect models for neuromorphic computing is carried out. This was followed by the selection of a hardware network simulator that can be modified to conform to our requirements. In particular, this simulator needs to have an interface that can take as input spike communication traffic generated by an application (behaviour) level simulator. This traffic is provided by Francesco Dell Anna - another student working closely in the same project [1]. Afterwards, investigated global interconnect models are built and integrated into the hardware network simulator. This includes taking into account power and delay model of switching element using TFT, based on IMEC technology. Once the simulator is operational, we perform intermediate experiments to determine which aspects of the network model affect performance and thus are critical for optimization. Finally, different optimization techniques are proposed to determine the best mapping from neuron clusters into interconnect hardware network. 1.4 Thesis organization This thesis is organized as follows: Chapter 1 gives information about the project goals and overall approach. Chapter 2 presents an overview of neuromorphic computing elements. The 2 Exploration of dynamic communication networks for Neuromorphic Computing

13 CHAPTER 1. INTRODUCTION interconnect models used to represent global synaptic connection between neuron clusters are described in Chapter 3. The literature survey performed to select for a network simulator is discussed in Chapter 4. Also in this chapter, the simulator software architecture and its usage, together with some intermediate simulation results are reported. These results lead to the formulation of another research on optimizing mapping neuron clusters into network nodes is presented, which is discussed in Chapter 5. In the same chapter, various algorithms for solving this problem as well as their simulation results are described and compared. Chapter 6 concludes the thesis and lists some interesting topics that can be further explored in future work. Exploration of dynamic communication networks for Neuromorphic Computing 3

14 Chapter 2 Neuromorphic Computing Background 2.1 Neuron models In the brain, neurons are the cells that are responsible for processing and transmitting information. A typical neuron consist of a body (soma), dendrites (inputs) and an axon (output). The connection between a dendrite and an axon is called a synapse. To mimic the functionality of the brain to perform recognition and classification tasks, it is important to understand and build correct behaviour of neuron model. Over the past few decades, many neuron models have been developed for performing computation and they can be classified into three main types [2]: Binary signal neuron; Continuous value neuron; Spiking neuron Binary neuron model The binary neuron model was jointly developed by McCulloch and Pitts in 1943 [3]. This model takes the weighted sum of the inputs and then compares the result with a threshold value: if the sum is larger than the threshold value, the neuron will give 1 as output, otherwise the output will be 0. This comparison step is called an activation function of a neuron model as shown in figure Continuous value neuron model The continuous value neuron model, as its name suggests, is different from the binary neuron model in the way that its output is a continuous value instead of a binary one. The activation function of this neuron model is usually a sigmoid function like hyperbolic tangent or logistic. The output of a neuron can be interpreted as either the value itself, or as the probability of producing 1 as output. Apart from sigmoid functions, in recent years, the Rectifier (also called ReLU - Rectified Linear Unit as shown in figure 2.2) also has been wide adopted as an 4 Exploration of dynamic communication networks for Neuromorphic Computing

CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.1: Artificial neuron model activation function in the Machine Learning - Neural Network world.

15 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.1: Artificial neuron model activation function in the Machine Learning - Neural Network world. Most of the state-ofthe-art Machine Learning algorithms employ this type of neuron model. Figure 2.2: Different activation functions Spiking neuron model The third type of neuron model is the spiking neuron model. This type of model takes spiking events as input and also outputs spiking events. Information is stored in the timing of spike events instead of being interpreted as spiking frequency like in the binary and continuous value neuron models. Spiking neuron model is considered the most biologically plausible among the three types, as transmitting spikes is how real neurons communicate with each other. In such a model, a neuron computes the weighted sum of all the spiking input currents integrated over time; when the membrane potential rises above a certain threshold, the neuron fires a spike. A spiking neuron model usually can be described using an electronic circuit or a set of ordinary differential equations, such as: leaky integrate and fire model, Izhikevich model, Hodgkin - Huxley model, etc. 2.2 Neural network A single neuron would not be able to represent or compute much information. However, a network of neurons connected together is much more computationally powerful than just the Exploration of dynamic communication networks for Neuromorphic Computing 5

16 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND mere addition of single neurons. Computation is then usually performed in the context of a neural network. Typically, neurons are organized into layers in such a network. In the Machine Learning world, many types of neural networks have been developed to solve different tasks. Usually, the connection setup between layers defines the neural network type. The networks described in this section are all classified as rate based neural networks, in contrast to the spiking ones Feed-forward neural network In this type of network, neurons are connected in a feed-forward fashion: only neurons from the lower layer give input to the neurons from the layer directly above; there is no connection between neurons from the same layer, skip layer connection, or feedback connection from the higher level back to the lower ones. A common feed-forward neural network will have full connectivity between two adjacent layers as seen in figure 2.3. in1 in2 in3 in4 out 1 out 2 in5 Figure 2.3: An example of feed-forward neural network Convolution neural network Convolution Neural Network (CNN) is a special kind of feed-forward neural network. It was developed with the main intention of solving image recognition task and was inspired by the cat s visual system. What makes CNN different from a normal Feed Forward neural network is that it has sparse connection: only a subset of the lower layer neurons is connected to a subset of the higher adjacent layer. The connections in CNN are divided into two stages: convolution and sub-sampling stages. In the convolution stage, the input units are overlapped and go through the entire input. The weights are shared from the input units to the receptive layer, hence the name convolution. In the sub-sampling stage, the input layer is partitioned into a set of non-overlapping rectangular regions and, for each such sub-region, outputs the maximum value. 6 Exploration of dynamic communication networks for Neuromorphic Computing

17 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.4: An example of Convolution Neural Network [4] Recurrent neural network Recurrent Neural Network (RNN) is different from feed-forward neural network in the way that in RNN, there are lateral connections between neurons of the same layer and/or feedback connections from higher level layers back to lower level ones. Training is usually more difficult in Recurrent neural network than in Feed Forward ones. This is due to the gradient vanishing/exploding problems since RNNs are usually trained as deep multilayer feed-forward neural network. There are different subtypes of RNNs. They differ from each other in the way the connection weights are initialized and learned: Long Short Term Memory: Introduction of memory cell with write, keep, and read gates [5]; Echo State Network: Connection weights between neurons in the hidden layer are not changed. Only the weights from the hidden layer to the output layer are updated during training [6]; Hopfield Net: Weights are updated to converge to a minimum energy instead of through backpropagation [7] Deep Belief Network Deep Belief Network is built by stacking Restricted Boltzmann Machines (RBM) on top of one another. RBM is a special Boltzmann Machine without lateral connection between unit in the same layer, visually, it looks like a bipartite graph. The training of such network is carried out in a greedy fashion as shown in figure 2.5: Af first, the weights between the lowest hidden layer and the visible layer can be trained independently as an RBM. Then, the lowest hidden layer can be used as data to train the next hidden layer. The training process can then be performed iteratively until the last hidden layer. This makes training such a network become much simpler compared to the traditional backpropagation method and thus more appealing as an alternative solution for other neural network types. One way to train an RBM layer is as follows: First, set the states of the visible units to a training vector, let this be called v. Then the binary states h of the hidden units are all computed in parallel. Once binary states have been chosen for the hidden units, a reconstruction v of the visible units is sampled by using the hidden units. Again, calculate Exploration of dynamic communication networks for Neuromorphic Computing 7

18 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND the binary states h of the hidden units using these reconstructed visible units. The change in a weight is then given by the formula: w ij = ɛ(< vh > < v h >) where ɛ represents a learning rate [8]. Figure 2.5: Training a Deep Belief Network [9] Interestingly, there has been research on combining these different types of neural network models to form even better performing ones [10]. 2.3 Learning A connection between neurons is called a synapse. Each synapse has a connection weight that can be used to store information. The learning process is then to change the weight of these connections so that the output neurons can give some meaningful values. There are three types of learning: Unsupervised learning: Discover a good internal representation of the input data. Supervised learning: Learn to predict an output when given a set of inputs after being exposed to some training data. Reinforcement learning: This is a special form of supervised learning, where the desired effect is learning to select an action that maximizes pay-off. This section describes which type of learning is used and how they are applied to networks of different neuron models Learning with continuous value and binary neuron For these types of neuron models, training a neural network is usually in the form of supervised learning and is often associated with a cost function. For example, the cost function can be mean squared error or softmax. The objective of the learning process is to reduce the cost as much as possible given the inputs. 8 Exploration of dynamic communication networks for Neuromorphic Computing

19 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Most of the time, the learning algorithm is based on gradient descent. Gradient descent is an optimization algorithm in which, to find a local minimum, one takes steps that are proportional to the negative of the gradient of the cost function at the current point. Learning in the network is then performed using some form of backpropagation. In backpropagation, for any given training signal, the error of an unit is calculated as the partial derivative of the cost function regarding the value of that unit. Between any two consecutive layers in a neural network, the errors of their connection weights are calculated based on the error of the output layer using derivative chain rule. These weights are then updated with an amount proportional to their errors. This process is carried out layer-by-layer until all weights are updated. Figure 2.6: An example of Forward pass (L) and Backpropagation (R) algorithm [11] Learning with spiking neurons Learning in spiking neural networks is devised based on synaptic plasticity change in biology. One of the most important rules for learning in such network is Spike Timing Dependent Plasticity (STDP). The main principle of STDP is that synaptic plasticity is updated according to the difference in spike timing between the pre and post synaptic neurons. There are many variations of STDP, but all of them follow this basic principle. For example, in figure 2.7, when the pre-synaptic neuron j fires within approximately 40ms before the post-synaptic neuron i does, the synaptic weight from j to i will increase. The closer the firing time of the two neurons are, the more the synaptic weight will increase. Vice versa, if the pre-synaptic neuron fires within approximately 40ms after the post-synaptic neuron, the synaptic weight update will decrease. STDP can be used for both supervised and unsupervised learning. Because learning in spiking neuron model is closer to biology and happens more locally (neurons from lower level does not need to wait for information from higher level to update their synaptic weights like in backpropagation), spiking neuron model is preferred to be built on neuromorphic hardware. Since the weights can be updated based on the local condition of the neuron, the learning circuits will be simpler for spiking neural network than for rate based ones. Exploration of dynamic communication networks for Neuromorphic Computing 9

20 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.7: Spike Timing Dependent Plasticity [12] 2.4 Neuromorphic Computing Basics of Neuromorphic Computing The field of Neuromorphic Computing was pioneered by Carver Mead, who saw similarities between neural systems and VLSI circuits [13]. This field of research focuses on how to build biologically-inspired models of neural systems on hardware level. These implementations of neuromorphic hardware can in turn be used to run biologically-inspired neural network algorithms efficiently, for tasks where the brain is usually better than the computer, such as image and speech recognition. There are quite a few different trends in developing neuromorphic hardware: digital versus analog, online learning versus static weight, etc. A typical neuromorphic hardware will have circuits for mimicking neuron functionalities such as: integrating spikes, firing spike based on membrane threshold, and performing learning rules. These are implementations of the spiking neuron models mentioned in section 2.1. For example, in figure 2.8, an 18-bit adder and accumulator are used for spike integration in this neuron model. In the second stage, the accumulated value will be compared with a threshold using the MUX and XOR blocks to generate a spike event that is sent to the address-event representation (AER) interface (which will be explained in more details in chapter 3) [14]. The analog post-synaptic learning circuit represented in figure 2.9 is more sophisticated. This circuit contains a Differential Pair Integrator (DPI), which helps integrates the postsynaptic neuron spikes and produces a current proportional to the neurons Calcium concentration. The three Winner-Take-All (WTA) circuits compare the Calcium concentration 10 Exploration of dynamic communication networks for Neuromorphic Computing

CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.8: A fully digital integrate and fire neuron model [14] current with three thresholds for weight changes or to stop learning. Figure 2.9: A post-synaptic learning circuit [15] 2.

21 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND Figure 2.8: A fully digital integrate and fire neuron model [14] current with three thresholds for weight changes or to stop learning. Figure 2.9: A post-synaptic learning circuit [15] Converting continuous value neural network to Spiking neural network Although SNN has some advantages over rate based neural network, overall, there is no clear general purpose algorithm or architecture for such network to learn arbitrary task, especially in a supervised way. Therefore, there has been research conducted to convert Exploration of dynamic communication networks for Neuromorphic Computing 11

22 CHAPTER 2. NEUROMORPHIC COMPUTING BACKGROUND rate based neural network algorithms to SNN. Using such a conversion process, rate based neural network algorithms can be performed on neuromorphic computing elements (which was introduced earlier in this section), which is much more power efficient than conventional computers used nowadays for Machine Learning. The basic principle of a conversion rule according to [16] and [17] is as follows: Train the network offline using continuous valued neurons using well established algorithms (Convolution Network, Recurrent, or Deep Belief Network). Use the weight obtained from training for the SNN. Normalize and then convert the input value to Poisson spiking rate. Convert the output of the neurons to spiking rate using appropriate algorithms. 12 Exploration of dynamic communication networks for Neuromorphic Computing

23 Chapter 3 Scalable Neuromorphic Interconnects 3.1 Communication in neuromorphic computing A challenging task in building neuromorphic computing hardware is trying to accommodate the number of synaptic connections within the human brain: a typical neuron has approximately 5000 to synaptic connections. One of the reasons that this is difficult to implement on hardware is brain synapses are connected in 3D while the VLSI approach usually can only provide connections in 2D. Additionally, area overhead and energy consumption for wiring and I/Os will be very large to implement those connections physically, especially when the number of neurons approach hundred millions or even billions like in the human brain. One way to reduce the complexity of implementing a large number of connections in hardware is to utilize time multiplexing. This is sensible to do as spiking actions in the brain happens in time range of milliseconds and the VLSI communication hardware is 3 to 6 order of magnitude fasters than that. The de facto way to transmit spiking event in a large neuromorphic systems is by addressevent representation (AER) protocol. Using this protocol, spike events are broadcast digitally with only the address of the neuron that emits the spike [18]. Time represents itself in such a configuration. Usually, the width and amplitude of the spike are not transmitted as they do not contain useful information [19]. However, in the case that such information is needed, the AER protocol can be extended to include the required data. Another way to reduce the number of connections is to try to identify the structure of the neural network and remove the connections that are not necessary. In brain networks, it has been found that neurons are connected in a small-world structure: neurons that are close together group into clusters and are connected in a (near) clique fashion, between clusters there are long connections that greatly reduce the path length between neurons from different clusters [20]. This is reflected in recent developments in neuromorphic computing: artificial neurons are grouped into clusters where they are fully connected (local connectivity), and in between clusters there are interconnects, where spiking communication happens less frequently (global connectivity). Both the cxquad chip [21] developed by Insitute of Neuroinformatics (INI) in Zurich and the TrueNorth chip developed by IBM utilize aforementioned approaches for reducing communication complexity. The cxquad chip implements a two-stage Network-on-Chip (NoC) Exploration of dynamic communication networks for Neuromorphic Computing 13

24 CHAPTER 3. SCALABLE NEUROMORPHIC INTERCONNECTS for the long interconnect to reduce memory requirement while still being able to have some flexibility for the network to adapt [22]. The IBM s TrueNorth chip, on the other hand, implements a time multiplexed mesh based network for the long interconnect [23]. These two approaches, together with the dynamically controlled segmented bus network developed in IMEC Leuven, will form the main interconnect models for comparison in this project. 3.2 Mesh network Mesh networks are frequently used in NoCs architecture. A mesh network is usually represented in 2D, although with recent developments, 3D mesh is also being explored in hardware. For simplicity, our focus in this project is on 2D mesh. From this point forward, the term mesh network is interpreted as 2D mesh. A regular m n mesh network has m tiles in every row and n tiles in every column. Each tile consists of a processing element (neuron cluster in our case) and a router. PE PE PE PE R R R R PE PE PE PE R R R R PE PE PE PE R R R R PE PE PE PE R R R R Figure 3.1: A 4x4 mesh network Since each tile is connected to its X and Y neighbours with full duplex communication, a typical (non-edge) mesh router will have 5 input and 5 output ports (see figure 3.2). These input and output ports are used to communicate in North, East, South, West, and local directions. Usually, every input port has a buffer associated with it to decouple between input and output communication. A mesh network is quite flexible in the sense that it can provide connectivity between any two neuron clusters in the network. However, since the traffic needs to pass by other neuron clusters in the network, there is a high probability of overlap between communication links, which can lead to high latencies in the network. 14 Exploration of dynamic communication networks for Neuromorphic Computing

25 CHAPTER 3. SCALABLE NEUROMORPHIC INTERCONNECTS Local N E W S Local N E W S Crossbar Figure 3.2: A typical mesh router 3.3 Segmented bus A segmented bus (as seen in figure 3.3) is an improvement over a traditional bus network. In a traditional bus network, any communication between two elements will prevent the rest of the network from sending and receiving data. Additionally, the entire bus always needs to be powered up even when communication distance is short. Segmented bus overcomes these drawbacks by dividing the bus into small segments separated by switches. By performing this division, when only short distance communication is required, the rest of the bus can be powered down to save energy consumption. Moreover, segmented bus can also facilitate parallel communication if the paths do not overlap. Controlling the segmented bus is carried out mainly using software [24]. At IMEC, a dynamically controlled version of such a segmented bus network has been proposed and designed. The specifics of this design are not essential for this project, therefore they will not be discussed here. Figure 3.3: An example of segmented bus network [24] Exploration of dynamic communication networks for Neuromorphic Computing 15

26 CHAPTER 3. SCALABLE NEUROMORPHIC INTERCONNECTS Our approach for exploring segmented bus architecture is to start with a full crossbar of switches. Neuron clusters are placed only at the edges of the network. Communication in four directions instead of three can be achieved by connecting two three-way switches with 180 rotation connected by a short wire as shown in figure 3.4. NC NC NC NC NC NC NC NC 3-way switch 3-way switch NC NC NC NC NC NC NC NC Figure 3.4: Full crossbar segmented bus 3.4 Two-stage NoC The two-stage NoC was developed to meet the large fan-out demand in neural networks while reducing memory requirements. With a network of size N neurons divided into C neurons per cluster, the two-stage NoC will have N/C intermediate nodes (routers), each associated with a cluster. If each neuron requires F fan-outs, the routing scheme is as follows: The first stage involves a neuron sending message copies to F/M routers, where M represents the number of fan-outs in the second stage. This is carried out using point to point communication. In the second stage, each router then broadcasts the same message to C neurons in the cluster it associates. Each neuron will have a set of tags to check whether the received message is intended for it. If the tags are uniformly distributed, a cluster will have M (M C) neurons having the same tag. In total, this represents M F/M = F fan-outs for any single neuron. 16 Exploration of dynamic communication networks for Neuromorphic Computing

27 CHAPTER 3. SCALABLE NEUROMORPHIC INTERCONNECTS Figure 3.5: Two-stage routing for spiking neural network [22] Exploration of dynamic communication networks for Neuromorphic Computing 17

28 Chapter 4 Network Simulator 4.1 Discrete-Event and Cycle-Accurate simulators In order to select a suitable network simulator for supporting the topology exploration for the global synapse communication, an important decision has to be made regarding the simulation time granularity. Cycle-accurate simulators, as the name suggests, can give the correct timing of the network and/or computation elements down to cycle level. Discrete-event simulators, on the other hand, model the operation of a system as a sequence of events over time. When there is no activity, time can be skipped forward to the next event. This can potentially help with reducing simulation time at the cost of losing time accuracy. Figure 4.1: System modelling graph [25] Figure 4.1 shows the level of timing detail that can be implemented for communication and computation element in a system simulation. For our simulator, since we are exploring a new communication architecture that will be built on actual hardware, we need time accuracy for the network elements to obtain more information on system behaviour in critical conditions. Regarding the neuron clusters (computation elements), we only need approximated time as this is not the focus of the simulator. In a later phase of development, once we have the 18 Exploration of dynamic communication networks for Neuromorphic Computing

29 CHAPTER 4. NETWORK SIMULATOR network type chosen, the simulator can be extended to have more relaxed timing accuracy on network elements for faster simulation on large networks (up to neuron clusters). Thus, we position our simulator on points C and D according to figure Simulator choices It is possible to develop our own hardware network simulator from scratch. However, that would mean spending a lot of time reinventing the wheel for more simple features that are already available in a well-developed network simulator. Hence, we choose to find a suitable simulator and then modify it according to our requirements. As this is an important step in the project, two comparisons were made to choose the most suitable candidates. The first round of comparison is a preliminary research into different simulators. We study literature to find out whether a candidate simulator can provide Cycle- Accurate simulation and the number of different network topologies it can support. It is also important to check when these simulators were last updated and whether their source code is available to download/modify. For each criterion, we rate the candidate simulators found in literature using a scale from 1 to 10. The candidates with highest total score are selected for further investigation. The result is shown in table 4.1 and 4.2. BookSim 2 [26] Uni Luebeck Atlas [28] Simulator [27] Rating Remarks Rating Remarks Rating Remarks Cycle accuracy 10 Available 10 Available 7 Available but depends on external RTL simulator (ModelSim) Adaptability 9 Have many 5 No information 7 Have only topologies & mesh/torus has been used topologies to simulate neuromorphic memristor based accelerator Code availability 10 Open source, 0 Code not 5 Code is avail- there is a available able but not mailing list for maintained questions to developers Total Table 4.1: Simulator preliminary comparison 1 Exploration of dynamic communication networks for Neuromorphic Computing 19

30 CHAPTER 4. NETWORK SIMULATOR Noxim [29] NIRGAM [30] HORNET [31] Rating Remarks Rating Remarks Rating Remarks Cycle accurate 10 Available 10 Available 10 Available Adaptability 6 Have only 2D 7 Have only 9 Many topologies mesh topology mesh/torus can be cre- but has topologies ated but the been extended processing element for others. is limited It has also to only MIPS been used for architecture neuromorphic computing simulation Code availability 9 Open source, 10 Open source, 5 Code is availspond developer re- we have good able but not within a contact with maintained few days developers Total Table 4.2: Simulator preliminary comparison 2 Following the preliminary research results, three promising simulators are selected for further investigation, namely: BookSim 2, Noxim, and NIRGAM. In the second round, the simulators are downloaded and their source codes are examined for comparison. With the source code available, we can test these simulators to see whether they provide trace driven traffic, which will be useful for modelling spike communication traffic from application-level simulators. As we aim to perform simulations for large scale networks, we also test the largest number of nodes that each of these simulators can support and whether they have speed up option for fast simulation. Finally, the software complexity is also taken into account, as we would need to understand the simulator software structure before we can modify it. The result of the second round of comparison is shown in table 4.3. Noxim, with the highest score in the second round of comparison, is our simulator of choice for further development of the proposed interconnect models. As more literature is available on mesh network and Noxim also supports mesh topology, our fist phase of development focus on building simulation models of neuron cluster communication in mesh networks. The segmented bus and two-stage NoC topologies will be implemented in a later phase. 20 Exploration of dynamic communication networks for Neuromorphic Computing

31 CHAPTER 4. NETWORK SIMULATOR Noxim NIRGAM BookSim 2 Rating Remarks Rating Remarks Rating Remarks Trace 5 Based on 8 Trace based on 8 Implemented driven packet injection previous simulation but format is simulation rate, need to be modified not clear Scalability 9 Can run 5 Current limit 7 Can run 30x30 30x30 mesh is 16x16 mesh size but size without need to change changing code quite a few simulation parameters Speedup 6 Transaction 3 No speedup, 5 Speedup based Level Modelling but can imple- on changing is ment SystemC relative speed implemented Transaction between routing for WiNoC. Level Modelling elements We can try for and channel to replicate this this in normal simulation Software 8 Simple code 6 Simple code 6 Well written complexity but code organization code, but the is complexity not intuitive level is high Total Table 4.3: Simulator detailed comparison 4.3 Simulator software models After selecting the simulator, it is important to specify what we want to do with the software and understand how it is structured before development. For this purpose, we create models of the simulator following UML convention. In particular, two types of UML diagrams are represented in this section: Class diagram: shows internal architecture of the software. Use case diagram: shows what the user can do with the simulator Class diagram The class diagram shown in figure 4.2 is not only for the original Noxim but also includes the extension we want to build for simulating neuromorphic computing cluster communication. Exploration of dynamic communication networks for Neuromorphic Computing 21

32 CHAPTER 4. NETWORK SIMULATOR Figure 4.2: Simulator class diagram 22 Exploration of dynamic communication networks for Neuromorphic Computing

33 CHAPTER 4. NETWORK SIMULATOR Due to size limitation, the following details need to be omitted from the class diagram: Routing Strategy is the generalization class of the following routing strategies: Dyad; Negative first; North last; Odd even; Table based; West first; XY. Traffic Model is able to generate traffic based on the following traffic models: Random; Transpose matrix; Bit-reversal; Butterfly; Shuffle; Table based. Configuration Manager loads input parameters from a file and passes the required information to other classes. The input parameters include, but are not limited to: Network topology; Network size; Traffic type; Routing strategy; Simulation time. Power Model loads power (and potentially area) profiles from a file and provides power (and area) calculation for other classes like Routing Element or Processing Element Use case diagram Use case diagram shows what the user can do with the simulator. For example, when changing the network topology, the user should be able to select between the three interconnect types that we intended to use: Mesh, Segmented bus, and Two-stage NoC. The user should also be able to input traffic from application-level simulator to run hardware network simulation. The option of using file format to give routing information will be useful when we explore full software routing control on segmented bus. Exploration of dynamic communication networks for Neuromorphic Computing 23

34 CHAPTER 4. NETWORK SIMULATOR Change network topology <<include>> Change network size <<include>> Input simulation parameters <<include>> Input traffic type <<extends>> Input traffic using file <<include>> User Start/Stop simulation Change routing strategy <<extends>> Input routing control using file Change simulation mode Figure 4.3: Simulator use case diagram 4.4 Intermediate simulation results After implementing mesh network in the simulator, we perform some experiments to verify that the simulator is working as expected. For example, to test the scalability of the simulator, we try to simulate a large network. We succeeded in performing simulations for mesh network up to ( ) nodes. This takes 1 hour and 15 minutes to simulate 30ms of spike traffic, with 2.2 million spiking events in total. Furthermore, we need to identify performance issues for architectural optimization in mesh network. Since we are looking for low power implementations, our main focus for comparison in these experiments is energy consumption. Another aspect worth investigating is latency, as high latency of spike communication can have a potential effect on STDP learning and lead to degradation of the neural network application mesh experiment The first experiment is performed using spike traffic generated from a synthetic example. In this experiment, the neural network consists of 27 neurons, divided into 3 layers (i.e. input, output, and a hidden layer), of 9 neurons each. Three neurons in the same layers are grouped into a cluster based on their ID, with a total of 9 clusters for the whole network. The traffic load is light, with a total of 892 spike events during 30ms of simulated time, this is equal to a spike rate of 3300 spikes/second per cluster. We then test the effect on dynamic energy consumption and latency when placing these neuron clusters at different locations on a 3 3 mesh network. As there are c! combinations for mapping c clusters into c network locations, we only simulate 10 random mappings. All simulations are carried out using XY routing. The results are normalized and shown in figure Exploration of dynamic communication networks for Neuromorphic Computing

35 CHAPTER 4. NETWORK SIMULATOR Normalized Latency and Energy Average Latency Dynamic Energy Mapping number Figure 4.4: Latency and dynamic energy consumption with different mappings in 3 3 mesh network Using the same setup, we examine the effect of 5 different routing strategies, this time using a fixed placement of neuron clusters. The results are shown in figure 4.5. Normalized Latency and Energy Average Latency Dynamic Energy 0 XY WEST FIRST NORTH LAST ODD EVEN DYAD Figure 4.5: Latency and dynamic energy consumption with different routing strategies in 3 3 mesh network mesh experiment For the second experiment, we perform the same tests as in section This time we use a larger neural network consisting of 1000 neurons, which is divided into 4 layers of 250 neurons each. Hardware-wise, we have 16 neuron clusters, each cluster consisting of 64 neurons. This neural network is then tested on a 4 4 mesh for 50ms simulated time. The traffic load is more heavy than in the previous experiment, with the spike rate of spikes/second per cluster, for a total of spike events. The results are shown in figure 4.6 and 4.7. Exploration of dynamic communication networks for Neuromorphic Computing 25

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most