FPGA DESIGN OF A MULTICORE NEUROMORPHIC PROCESSING SYSTEM. Thesis. Submitted to. The School of Engineering of the UNIVERSITY OF DAYTON

Size: px
Start display at page:

Download "FPGA DESIGN OF A MULTICORE NEUROMORPHIC PROCESSING SYSTEM. Thesis. Submitted to. The School of Engineering of the UNIVERSITY OF DAYTON"

Transcription

1 FPGA DESIGN OF A MULTICORE NEUROMORPHIC PROCESSING SYSTEM Thesis Submitted to The School of Engineering of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree of Master of Science in Electrical Engineering By Bin Zhang Dayton, Ohio May, 2016

2 FPGA DESIGN OF A MULTICORE NEUROMORPHIC PROCESSING SYSTEM Name: Zhang, Bin APPROVED BY: Tarek M Taha, Ph.D. Advisory Committee Chairman Associate Professor Electrical and Computer Engineering Keigo Hirakawa, Ph.D. Committee Member Assistant Professor Electrical and Computer Engineering Eric Balster, Ph.D. Committee Member Assistant Professor Electrical and Computer Engineering John G. Weber, Ph.D. Associate Dean School of Engineering Eddy M. Rojas, Ph.D., M.A., P.E. Dean, School of Engineering ii

3 Copyright by Bin Zhang All rights reserved 2016 iii

4 ABSTRACT FPGA DESIGN OF A MULTICORE NEUROMORPHIC PROCESSING SYSTEM Name: Zhang, Bin University of Dayton Advisor: Dr. Tarek M. Taha Neuromorphic computing architecture has developed rapidly during recent years. Neuronmorphic network processor FPGA implementation is 3x and 127x faster than Intel E8400 processor with edge detection applications and ECG applications respectively. Considering resource utilization and system stability, a hardware-controlled communication routing network is a better choice than a time-delay based routing network. The separation of data lines prevents the hardware-controlled communication routing network from turning into a large network. iv

5 ACKNOWLEDGMENTS In this work, the basic static router was designed by our team. The new static routing design and comparison was accomplished by myself. I would like to express deep appreciation and thanks to my adviser Professor Tarek M. Taha, who has the attitude and skills of a true scientist. His approach to doing research was highly inspiring. I would also like to thank my teammates, Yangjie Qi and Hua Chen for the efforts made in this work. I would also like to thank Md Raqibul Hasan who gives us lots of useful advice and help. v

6 To my mentor and dear friends at UD. Thank you for all of your support along the way. vi

7 TABLE OF CONTENTS ABSTRACT... iv ACKNOWLEDGMENTS... v DEDICATION... vi LIST OF FIGURES... ix LIST OF TABLES... x CHAPTER 1 INTRODUCTION NEURAL PROCESSOR WORK PROCESSING MULTICORE NEUROMORPHIC PROCESSOR... 3 CHAPTER 2 ARCHITECTURE DESIGN OF CORE AND BASIC NETWORK CORE DESIGN ROUTER DESIGN... 9 CHAPTER 3 STATIC ROUTING DESIGN BASIC DESIGN OF STATIC ROUTING STATIC ROUTING DESIGN PROBLEM TIME DELAY BASED ROUTING PROTOCOL ROUTER HARDWARE CONTROLLED COMMUNICATION PROTOCOL ROUTER CHAPTER 4 EXPERIMENTAL SETUP CHAPTER 5 RESULTS vii

8 5.1 IMPLEMENTATION OF NEURAL CORE AND NETWORKS COMPARISON OF TWO KINDS OF STATIC ROUTING PROTOCOLS TWO APPLICATIONS RESULTS VERIFICATION PERFORMANCE COMPARISON WITH A RISC PROCESSOR CHAPTER 6 CONCLUSION BIBLIOGRAPHY viii

9 LIST OF FIGURES Figure 1.1: Example of the neural network structure Figure 1.2: Digital design of neuromorphic core Figure 1.3: Multicore neuromorphic network Figure 2.1: Data bus Figure 2.2: State machine Figure 2.3: Structure of dynamic router Figure 3.1: Static routing Figure 3.2: Multi-sources conflict Figure 3.3: A simple handshaking protocol Figure 3.4: Hardware controlled communication protocol router signals Figure 5.1: Example of SignalTab II results ix

10 LIST OF TABLES 1.1: Comparison of neuromorphic systems and traditional computing platforms : Single core FPGA resource utilization : Multicore FPGA resource utilization : Dynamic and static network FPGA resource utilization comparison : Two kinds of routing network FPGA resource utilization : ECG process time of one pattern data : ECG process result of one pattern data : Edge detection result of some random pixels : Applications throughput x

11 CHAPTER 1 INTRODUCTION Neuromorphic computing architectures have been rapidly developed in recent years. This special architecture is suitable for large parallel processing applications, such as image processing, synthesis (RMS), and pattern recognition. John et al. [1] showed that four applications, the autonomous virtual robot driver, pong player, virtual digit recognition, and auto-association, work well on a single core containing 1024 axons, and synapses. 1.1 NEURAL PROCESSOR WORK PROCESSING There are two basic forms of neural networks, feed-forward and feed-back. The feedback form is used to train the system and modify weight. The feed-forward form is the most common form, as shown in Figure

12 Figure 1.1: Example of the neural network structure. The process of one layer represents the work of one core. The axons are represented by xi, neurons by yj, and the weight value of the i th axon for the j th neuron is Wij. Each axon applies its weight value of the corresponding neuron. The activation of the sum of every axon product is the final value for each neuron. This process can be evaluated as: vj= iwi,jxi + b. (1.1) yj=f(vj). (1.2) In Equation (1.2), f is an activation of outputs and is normally a sigmoid function, ff(xx) = ee xx, (1.3) and b is the bias. The network will modify its weight by this bias. The reason for using a sigmoid function is to keep the output value within the range of [0, 1]. For a layer of neural network with i axons and j neurons, the weight is i j. As data transfer costs energy, the cost of the transfer of a large amount of data is high. However, instead of moving large amounts of weight data, neural networks store weight data inside each core, which reduces the power consumption compared to other systems. 2

13 1.2 MULTICORE NEUROMORPHIC PROCESSOR The digital design of the core is shown in Figure 1.2. The transmission from pre-synaptic neuron inputs to post-synaptic neuron outputs in a neuromorphic core is similar to signal transmission between nerve cells in the human nervous system. The basic work structure is shown in Figure 1.1. Pre-synaptic neuron inputs coming in over routing network from other cores Pre-synaptic neuron number i Decoder W ij Digital synaptic memory array (i, x i ) Input Unit x i Pre-synaptic neuron value Multiply-addaccumulate to calculate neural outputs acc acc acc acc Control Unit Activation Function Routing Unit Post-synaptic neuron outputs to routing network Figure 1.2: Digital design of neuromorphic core. Neuromorphic processors can be built in an on-chip routing network [2, 3, 4], as shown in Figure 1.3. NC NC NC NC R R R R NC NC NC NC R R R R NC NC NC NC R R R R NC NC NC NC R R R R Figure 1.3: Multicore neuromorphic network. 3

14 In Figure 1.3, each neural core connects a router which has four direction connections to other routers. To assess the performance of the designed neural cores, we compared their estimated performance and power consumption with currently-used high performance processors. Two processor platforms were examined: the six core Intel Xeon X5650 processor and the NVIDIA Tesla M2070 GPGPU. We measured the peak neurons per second throughput of these systems [5]. Table 1.1: Comparison of neuromorphic systems and traditional computing platforms. Power Density (mw/ mm 2 ) System power eff. over Xeon # of Chip area % Time Power Configuration chips (mm 2 ) active (W) Memristor Core % ,867 Static random-access memory (SRAM) Core % ,049 NVIDIA M % Intel Xeon X % Table 1.1 compares the performance of the four systems considered when evaluating a neural network with 25,600 neurons, with each neuron having 1024 inputs and the full neural network being evaluated at 100,000 iterations/s. The results show that the specialized neuromorphic systems provide significantly more power and area efficiency compared to the traditional high performance computer platforms when running neural networks. In this work we present a field-programmable gate array (FPGA) implementation of the multicore digital neuromorphic processor. The system is evaluated for several applications and is compared with Intel processors. The design of the digital neuromorphic system has a great deal of similarity with the analog memristor 4

15 neuromorphic system in terms of the control logic in the cores, the input and outputs to the cores, and the routing systems. The key difference is in each core s neuron compute circuits (memristor based or SRAM/adder/multiplier based). Hence, designing the digital design on an FPGA will help determine the peripheral and routing logic for the analog cores as well. 5

16 CHAPTER 2 ARCHITECTURE DESIGN OF CORE AND BASIC NETWORK 2.1 CORE DESIGN As Figure 1.2 shows the neuromorphic core contains the whole process of the feed forward neural network. Each core processes a collection of N neurons, with each neuron having up to M input axons. The input synaptic weights (Wi,j) are stored in a memory array. These synaptic values are multiplied with the pre-synaptic input values (xi) and are summed into an accumulator. Once the final output neural values are generated, they pass through an activation function unit that implements Equation (2). The output of the activation unit goes to a routing unit on the core that looks up the destination of the neuron and sends a packet to the on-chip router with the neuron output and neuron destination Components function In order to complete the process, this neuromorphic core is separated into six components: input dispatch, weight memory, calculation, control unit, activation function and output package. The basic function of each component is given below: 6

17 (1) Data bus The data bus between each core consists of three parts: valid bit, address bit and data bit, as shown in Figure 2.1. Figure 2.1: Data bus. The most significant position is valid bit which decides whether data is valid or not. Address bits contains the next level router coordinate and weight address. Data bits for an axon are the pre-synaptic input values, and for a neuron, Data bits are the output of the activation. (2) Input dispatch This component decides whether the data coming in is useful or not by testing the value of the valid bit. If the data is valid, it sends the address bits to weight memory and the data bits to calculation and, in the meantime, activates the control unit. (3) Weight memory This component stores the weight value in a memory array. The weight memory receives the address bits from the input dispatch, finds the corresponding weight value and sends the weight value to calculation. (4) Calculation For each neuron, this component carries out a calculation based on Equation (1.1). When the calculation finishes, the process pauses until the destination core in the next layer is available to receive data. Then the calculation results are sent from each register to the Activation Function. 7

18 (5) Activation function This part processes Equation (1.2). In order to reduce unnecessary exponent calculations, a look up table has been built. With this method, the number of logic elements that are required for the calculation is reduced, but the number of registers used to store data is increased. (6) Output package This component packages all the messages into data bus format: valid bit; address bits which come from router table; and data bits which come from the calculation. Then these messages are sent out. (7) Control unit This component controls all the components mentioned above and makes them work in order. The control unit includes a state machine and control components which provide the total neuron number and delay time. The state machine is a Moore machine as shown in Figure

19 Figure 2.2: State machine. Basically, this state machine contains seven states: IDLE, Input Calculate, Bias Calculate, Calculate Finish, Holding, ALU Read, Output Read and Finish. 2.2 ROUTER DESIGN Two possible approaches to implementing the routing network are static and dynamic routing. The design of static routing will be explored in chapter 3. In dynamic routing, each core sends out a packet with a destination header. This packet header is examined by each router it passes through to direct the packet towards its destination. Dynamic routing is generally resource and power intensive, requiring buffers, a crossbar switch, and a switch allocator per router. The structure of a dynamic router is shown in Figure

20 Figure 2.3: Structure of dynamic router. In Figure 2.3, there are five input ports and five output ports. Every clock cycle, five input port data is dispatched to its destination direction buffer, and each direction output port sends out one message. 10

21 CHAPTER 3 STATIC ROUTING DESIGN 3.1 BASIC DESIGN OF STATIC ROUTING In static routing, a dedicated connection is set up between a source core and its destination cores. When a particular neural network is mapped onto the multi-core system, the communication pattern between the cores becomes deterministic. Thus the connectivity needed between the cores is pre-determined, and therefore, static routing between the cores can be utilized (similar to routing between configurable logic blocks on an FPGA). This approach requires a routing switch which is usually an n-type metaloxide-semiconductor field effect transistors (MOSFETs) in this design. Each connection within the routing switch requires a memory cell to enable reconfiguration of the path for a particular network (Figure 3.1). The reconfiguration process starts at the beginning of the whole system. 11

22 5x5 Crossbar Initialization buffer Figure 3.1: Static routing. The key benefit of static routing is that it does not require dynamic routing logic. This can significantly reduce the power consumption. If the channel utilizations are low, then the area of static routing could be larger than dynamic routing. A previous study showed that the power and area consumption of static routers is significantly less than those of dynamic routers [5]. In order to improve the quality of FPGA implementation, the multicore neuromorphic cores will be connected on an FPGA board with static routing. 3.2 STATIC ROUTING DESIGN PROBLEM Even though the benefit of static routing is obvious, our original static routing design suffers from a big disadvantage. When multiple cores are sending data to one core at the same time, as Figure 3.2 shows, the input port of the destination core cannot receive those different data at the same time because of the simplicity of the structure and a lack of signal transmission arrangement. 12

23 Figure 3.2: Multi-sources conflict. In Figure 3.2, three directions routing switches are turned on. It is impossible to control these switches during the process of data transfer. As a result, data conflict is inevitable in this design. (1) For a neural network, both sending data to one core from multiple cores and sending data to multiple cores from one core is frequent. (2) In order to ensure that this system works efficiently with a limited chip area, a balance between network size and router usage frequency is necessary. Based on the two points above, there will be significant conflict between two cores of a static router, which is a critical problem. In order to solve this problem, two approaches are used, time delay based routing protocol and hardware controlled communication protocol. 13

24 3.3 TIME DELAY BASED ROUTING PROTOCOL ROUTER The most significant feature of static routing is pre-determined. Not only the routing switches, but the circuits between all routers are scheduled before system processing. For multi-sources conflict, if data from different sources are sent at different times, and there is only one data transfer process going on at one time, conflict will not occur. During this process, in Figure 3.2, the up direction output port can transfer data packages A, B and C with separate periods. Set TA, TB and TC as the transfer time each source costs from sending the package head to the package tail. Then a possible time delay schedule for each source is: DA = 0; DB = TA; DC = TA + TB. This is a simple conflict issue which can be easily scheduled. Brandner et al. [6] designed a static routing system for a network-on-chip (NoC). Their design solves the problem caused by traditional communication methods among relevant cores. Their solution is very similar to this time delay protocol. They summarized two principles which are also suitable for this protocol: (1) No two routes start or end at the same time instant in the communication schedule; (2) Any two routes scheduled concurrently utilize disjoint sets of communication links [6]. Given these two principles, a fully considered time schedule is absolutely necessary. Using this method, in order to decrease system processing time, the transfer time for each package needs to be estimated as accurately as possible. On the other hand, even a single mistake in a time schedule will cause conflict. In order to decrease the possibility of these conflicts occurring, it is effective to add extra time to each package transfer time as an error tolerance. 14

25 Generally, the time schedule protocol needs a careful and fully considered schedule. But as the architecture is the same as the basic design, it fully inherits the advantage of static routing. 3.4 HARDWARE CONTROLLED COMMUNICATION PROTOCOL ROUTER For a multi-sources NoC conflict problem, a common solution is to use hardware communication protocol. In this case, an efficient communication between data senders and data receivers is established. This communication protocol is based on handshaking Handshaking Handshaking is a useful negotiation method between two components. A simple handshaking protocol only has two signals, a require signal and an acknowledge signal, as shown in Figure 3.3. Figure 3.3: A simple handshaking protocol. In Figure 3.3, the sender is about to send a message to the receiver. For this purpose, the sender sends a require signal to ask the receiver for permission. If the receiver is available to receive this message, it will send back an acknowledge signal which allow this transmission. Then the sender will send the message successfully and it will not conflict with other messages. 15

26 3.4.2 Hardware controlled communication protocol router design Using the handshaking protocol require signals, acknowledge signals, and data lines need to be on separate channels. Signal transmission for one router is shown in Figure 3.4. Figure 3.4: Hardware controlled communication protocol router signals. In Figure 3.4, as for the unidirectional line, each channel has two directions. Within one direction, there are require signals, acknowledge signals and a data line. The total number of signals of one router is Ws=2 Nc (2 + Wd). (3.1) Here, Ws is the sum of the width of all signals and Nc is the number of channels. For a router, there are three wires per channel for require and acknowledge signals and the data bus, which is the 2+Wd term in Equation (3.1), where Wd is the data bus width. As the number of cores in a neural network increases, the signal width within one router becomes large, which increases the required number of wires on the FPGA. 16

27 As for the handshaking protocol, each core sends a require signal to the next layer core after computing the neuron outputs. The next layer core will respond with an acknowledge signal. As shown in Figure 3.3, the sender will hold the neural output until receiving permission. With this protocol, effective communication between the cores completely solves the multi-sources conflict. 17

28 CHAPTER 4 EXPERIMENTAL SETUP The multicore neuromorphic processor was implemented on an Altera DE2 board which contains an Altera Cyclone IV FPGA (part EP4CE115). This neuromorphic network was programmed in Verilog and simulated using ModelSim. It was then compiled using Quartus II. Finally, it was tested on an FPGA board. The number of neurons per core, bits per neuron, and synapses per neuron were used as compile time variables so that different design options could be examined. The synaptic memory was implemented using the on-board memory within the FPGA. To evaluate the performance of the FPGA, we applied it to an edge detection process and electrocardiography (ECG) signal analysis. Descriptions of the applications are given below. Edge detection: This application aims to identify points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. Changes or discontinuities in luminance values within images are fundamentally important primitive characteristics because they often provide an indication of the physical extent of objects within the image. To evaluate pixel values of the edged image for each pixel we utilized a neural network of configurations 9->11->1 (9 inputs, 11 neurons in the hidden layer and one output neuron), 9->11->1, and 2->5->1. 18

29 ECG: Several applications require a constant ECG to be carried out on a person. These include several variants of implanted heart devices, remote health monitoring systems for elderly people, or patients who use body sensors. These devices are operated by battery power and therefore it is extremely important for such devices to have extremely lowpower consumption. A study examined the Arrhythmia Data Set which consists of 220 patterns of 16 classes [7]. Each pattern consists of 279 attributes/features. For this application we utilized a 279->50->16 neural network configuration. 19

30 CHAPTER 5 RESULTS 5.1 IMPLEMENTATION OF NEURAL CORE AND NETWORKS Two versions of the core were implemented, one with 16 neurons per core and the other with 64 neurons per core. Both cores had 512 synapses per neuron and ran at 50MHz. Table 5.1 shows the FPGA resource utilization for these two cores. Table 5.1: Single core FPGA resource utilization. 16 neuron core 64 neuron core Logic elements 1,844 4,924 Registers 1,105 41,77 Memory bits 131, ,288 Multipliers This neuromorphic network is connected by time delay static routing on an FPGA. The FPGA was able to fit a 3 3 grid of the 16 neuron cores and a 2 2 grid of the 64 neuron cores (all at 50MHz). Table 5.2 shows the total FPGA resource utilization for these two multicore systems, taking both core and static routing logic into consideration. The results indicate that on-chip multipliers are the limiters to scaling the number of cores. 20

31 Table 5.2: Multicore FPGA resource utilization. Network for edge detection Network for ECG Routing method Static routing Static routing Core utilization 9 4 Neurons per core Logic elements 17,531 (15%) 19,828 (17%) Registers 9,896 (8%) 16,688 (14%) Memory bits 1,179,648 (29%) 2,097,152 (52%) Multipliers 288 (54%) 512 (96%) In order to compare dynamic routing and static routing, a 2 2 network which uses a 64- neuron core is also built on the FPGA board. The FPGA resource utilization of this network is shown in Table 5.3. Table 5.3: Dynamic and static network FPGA resource utilization comparison. Dynamic network Static network Core utilization 4 4 Neurons per core Total Logic Elements 45,889 (40%) 19,828 (17%) Total Registers 35,787(30%) 16,688 (14%) Total Memory Bits 2,101,052(52%) 2,097,152 (52%) Multipliers 512(96%) 512 (96%) The comparison of Table 5.3 shows the main advantage of static routing over dynamic routing. The number of logic elements used in dynamic routing is over twice that of static routing. In this design, each dynamic router contains more buffers and logic elements. Therefore, dynamic routing designs require a greater chip area compared to static routing designs. 21

32 5.2 COMPARISON OF TWO KINDS OF STATIC ROUTING PROTOCOLS A hardware controlled communication protocol static routing network is implemented on the FPGA. This network contains a 2 2 grid of the 64 neuron cores which is the same as the former network with a time delay routing protocol. Table 5.4: Two kinds of routing network FPGA resource utilization. 2 2 Hardware controlled Time delay routing 64 neuron cores routing Logic (20%) (17%) Registers (16%) (15%) Memory bits (53%) (53%) Multipliers 512 (96%) 512 (96%) Table 5.4 shows the total FPGA resource utilization for these two kinds of routing network. Even though the architecture of the hardware controlled router is more complex than that of the time delay router, their FPGA resource utilization are similar. Table 5.5 shows the process time of one pattern data in ECG application. Table 5.5: ECG process time of one pattern data neuron cores Process Time (cycles, 1 cycle = 200 ps) Hardware controlled routing 359 Time delay routing 357 In Table 5.5, the hardware controlled routing network takes a little more time compared to the time delay routing network. ECG, which is used in these two networks, is a simple network. It allows a time schedule to be built as easily and accurately as a straight coreto-core network without any routing. As for the two networks' close process time, their ECG application throughputs are close. 22

33 5.3 TWO APPLICATIONS RESULTS VERIFICATION Since the ECG application requires two layers of neurons, we utilized the 4 core system with 64 neurons per core on the FPGA. Each layer of neurons was simulated on one core. Since the edge detection application requires 4 layers of neurons, the 9 core system with 16 neurons per core was utilized. Both the first and the second layers used two cores each. The third and fourth layers used one core each. The routing network used in this system is a time delay based routing protocol static routing. On FPGA, the result of applications were verified using the SignalTab II tool in Quartus II as shown in Figure 5.1. Figure 5.1: Example of SignalTab II results. In Figure 5.1, the port OUT[15..0] represents the calculation result, which is a 16 bit fixed point value. The top eight bits represent the integer portion and last eight binary bits represent the fractional portion. In order to verify the FPGA results, the Matlab calculation results are shown in Table 5.6 and Table

34 Table 5.6: ECG process result of one pattern data. Class Number Software calculation FPGA calculation Table 5.7: Edge detection result of some random pixels. Software calculation FPGA calculation For ECG, the result of one pattern is shown in Table 5.6. Some randomly picked results are shown in Table 5.7. The calculation results are rounded from fixed point value to floating point value. The FPGA calculation matches the software calculation. The diagnosis for this ECG pattern of attributes is class 1. Based on Arrhythmia Data Set, class 1 is a normal ECG. 5.4 PERFORMANCE COMPARISON WITH A RISC PROCESSOR 24

35 We compared our FPGA performance with an implementation of the applications on an Intel E8400. The edge detection application was implemented in a non-neural network form as that would be most efficient on an RISC processor. The ECG application was implemented as a neural network. Table 5.8: Applications throughput. Application Intel E8400 FPGA Edge detection (million pixels/second) ECG (inputs/second) ,000 Table 5.8 shows the throughput achieved on the Intel processor compared to the FPGA. The results show that the FPGA implementation provided about 3x and 127x higher throughput than the Intel processor for the edge detection and ECG applications respectively. 25

36 CHAPTER 6 CONCLUSION This neuromorphic network processor's FPGA implementation is 3x and 127x faster than an Intel E8400 processor for the edge detection application and ECG application respectively. Considering resource utilization and system stability, a hardware controlled communication routing network is not a good choice. On the other hand, for a big and complex network, it will be difficult to schedule an efficient time delay network. The separation of data lines prevents the hardware controlled communication routing network from becoming a large network. If the communication method between cores becomes more comprehensive than the current simple handshaking protocol design, the data line may be able to be shared amongst all cores. But the high FPGA resource usage is a big problem. Considering those disadvantages of static routing, even though dynamic routing takes more resource utilization, but dynamic routing network is much more stable than static routing network, especially for some huge number of data applications. 26

37 BIBLIOGRAPHY [1] J, Merolla. P, Akopyan. F, et al., Building Block of a Programmable Neuromorphic Substrate: A Digital Neurosynaptic Core, International Joint Conference on Neural Networks (IJCNN), June [2] T. M. Taha, R. Hasan, C. Yakopcic, and M. R. McLean, Exploring the Design Space of Specialized Multicore Neural Processors, IEEE International Joint Conference on Neural Networks (IJCNN), [3] R. Hasan and T. M. Taha, Enabling Back Propagation Training of Memristor Crossbar Neuromorphic Processors, IEEE International Joint Conference on Neural Networks (IJCNN), [4] C. Yakopcic, R. Hasan, T. M. Taha, Efficacy of Memristive Crossbars for Neuromorphic Processors, IEEE International Joint Conference on Neural Networks (IJCNN), [5] R. Hasan and T. M. Taha, On-Chip Static vs. Dynamic Routing for Feed Forward Neural Networks on Multicore Neuromorphic Architectures, International Conference on Advances in Electrical Engineering (ICAEE), December [6] F. Brandner, M. Schoeberl, Static Routing in Symmetric Real-Time Network-on-Chips, [7] 27

FPGA BASED HIGH THROUGHPUT LOW POWER MULTI-CORE NEUROMORPHIC PROCESSOR. Thesis. Submitted to. The School of Engineering of the UNIVERSITY OF DAYTON

FPGA BASED HIGH THROUGHPUT LOW POWER MULTI-CORE NEUROMORPHIC PROCESSOR. Thesis. Submitted to. The School of Engineering of the UNIVERSITY OF DAYTON FPGA BASED HIGH THROUGHPUT LOW POWER MULTI-CORE NEUROMORPHIC PROCESSOR Thesis Submitted to The School of Engineering of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

INCREASING THE EFFICIENCY OF NETWORK INTERFACE CARD. Amit Uppal

INCREASING THE EFFICIENCY OF NETWORK INTERFACE CARD. Amit Uppal INCREASING THE EFFICIENCY OF NETWORK INTERFACE CARD By Amit Uppal A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Master of

More information

4. Configuring Cyclone II Devices

4. Configuring Cyclone II Devices 4. Configuring Cyclone II Devices CII51013-2.0 Introduction Cyclone II devices use SRAM cells to store configuration data. Since SRAM memory is volatile, configuration data must be downloaded to Cyclone

More information

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017

SpiNNaker a Neuromorphic Supercomputer. Steve Temple University of Manchester, UK SOS21-21 Mar 2017 SpiNNaker a Neuromorphic Supercomputer Steve Temple University of Manchester, UK SOS21-21 Mar 2017 Outline of talk Introduction Modelling neurons Architecture and technology Principles of operation Summary

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

EEL 4783: Hardware/Software Co-design with FPGAs

EEL 4783: Hardware/Software Co-design with FPGAs EEL 4783: Hardware/Software Co-design with FPGAs Lecture 5: Digital Camera: Software Implementation* Prof. Mingjie Lin * Some slides based on ISU CPrE 588 1 Design Determine system s architecture Processors

More information

Neuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi

Neuromorphic Hardware. Adrita Arefin & Abdulaziz Alorifi Neuromorphic Hardware Adrita Arefin & Abdulaziz Alorifi Introduction Neuromorphic hardware uses the concept of VLSI systems consisting of electronic analog circuits to imitate neurobiological architecture

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164

ISSN: [Bilani* et al.,7(2): February, 2018] Impact Factor: 5.164 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A REVIEWARTICLE OF SDRAM DESIGN WITH NECESSARY CRITERIA OF DDR CONTROLLER Sushmita Bilani *1 & Mr. Sujeet Mishra 2 *1 M.Tech Student

More information

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard

FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE Standard FPGA Implementation of Multiplier for Floating- Point Numbers Based on IEEE 754-2008 Standard M. Shyamsi, M. I. Ibrahimy, S. M. A. Motakabber and M. R. Ahsan Dept. of Electrical and Computer Engineering

More information

The extreme Adaptive DSP Solution to Sensor Data Processing

The extreme Adaptive DSP Solution to Sensor Data Processing The extreme Adaptive DSP Solution to Sensor Data Processing Abstract Martin Vorbach PACT XPP Technologies Leo Mirkin Sky Computers, Inc. The new ISR mobile autonomous sensor platforms present a difficult

More information

Digital Systems Design. System on a Programmable Chip

Digital Systems Design. System on a Programmable Chip Digital Systems Design Introduction to System on a Programmable Chip Dr. D. J. Jackson Lecture 11-1 System on a Programmable Chip Generally involves utilization of a large FPGA Large number of logic elements

More information

CS 101, Mock Computer Architecture

CS 101, Mock Computer Architecture CS 101, Mock Computer Architecture Computer organization and architecture refers to the actual hardware used to construct the computer, and the way that the hardware operates both physically and logically

More information

Neuromorphic Computing: Our approach to developing applications using a new model of computing

Neuromorphic Computing: Our approach to developing applications using a new model of computing Neuromorphic Computing: Our approach to developing applications using a new model of computing David J. Mountain Senior Technical Director Advanced Computing Systems Research Program Background Info Outline

More information

Teaching Computer Architecture with FPGA Soft Processors

Teaching Computer Architecture with FPGA Soft Processors Teaching Computer Architecture with FPGA Soft Processors Dr. Andrew Strelzoff 1 Abstract Computer Architecture has traditionally been taught to Computer Science students using simulation. Students develop

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Instantaneously trained neural networks with complex inputs

Instantaneously trained neural networks with complex inputs Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2003 Instantaneously trained neural networks with complex inputs Pritam Rajagopal Louisiana State University and Agricultural

More information

GPU Programming Using NVIDIA CUDA

GPU Programming Using NVIDIA CUDA GPU Programming Using NVIDIA CUDA Siddhante Nangla 1, Professor Chetna Achar 2 1, 2 MET s Institute of Computer Science, Bandra Mumbai University Abstract: GPGPU or General-Purpose Computing on Graphics

More information

1.3 Data processing; data storage; data movement; and control.

1.3 Data processing; data storage; data movement; and control. CHAPTER 1 OVERVIEW ANSWERS TO QUESTIONS 1.1 Computer architecture refers to those attributes of a system visible to a programmer or, put another way, those attributes that have a direct impact on the logical

More information

Low Power Design Techniques

Low Power Design Techniques Low Power Design Techniques August 2005, ver 1.0 Application Note 401 Introduction This application note provides low-power logic design techniques for Stratix II and Cyclone II devices. These devices

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

FlexRay The Hardware View

FlexRay The Hardware View A White Paper Presented by IPextreme FlexRay The Hardware View Stefan Schmechtig / Jens Kjelsbak February 2006 FlexRay is an upcoming networking standard being established to raise the data rate, reliability,

More information

Section 3. System Integration

Section 3. System Integration Section 3. System Integration This section includes the following chapters: Chapter 9, Configuration, Design Security, and Remote System Upgrades in the Cyclone III Device Family Chapter 10, Hot-Socketing

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN - FRBA 2011 www.electron.frba.utn.edu.ar/dplab Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable.

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Neural Computer Architectures

Neural Computer Architectures Neural Computer Architectures 5kk73 Embedded Computer Architecture By: Maurice Peemen Date: Convergence of different domains Neurobiology Applications 1 Constraints Machine Learning Technology Innovations

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Lecture 11: Packet forwarding

Lecture 11: Packet forwarding Lecture 11: Packet forwarding Anirudh Sivaraman 2017/10/23 This week we ll talk about the data plane. Recall that the routing layer broadly consists of two parts: (1) the control plane that computes routes

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

5. Configuring Cyclone FPGAs

5. Configuring Cyclone FPGAs 5. Configuring Cyclone FPGAs C51013-1.5 Introduction You can configure Cyclone TM FPGAs using one of several configuration schemes, including the active serial (AS) configuration scheme. This scheme is

More information

EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES

EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES EMBEDDED SOPC DESIGN WITH NIOS II PROCESSOR AND VHDL EXAMPLES Pong P. Chu Cleveland State University A JOHN WILEY & SONS, INC., PUBLICATION PREFACE An SoC (system on a chip) integrates a processor, memory

More information

Laboratory Exercise 7

Laboratory Exercise 7 Laboratory Exercise 7 Finite State Machines This is an exercise in using finite state machines. Part I We wish to implement a finite state machine (FSM) that recognizes two specific sequences of applied

More information

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration

Lecture 1: Introduction Course arrangements Recap of basic digital design concepts EDA tool demonstration TKT-1426 Digital design for FPGA, 6cp Fall 2011 http://www.tkt.cs.tut.fi/kurssit/1426/ Tampere University of Technology Department of Computer Systems Waqar Hussain Lecture Contents Lecture 1: Introduction

More information

White Paper Assessing FPGA DSP Benchmarks at 40 nm

White Paper Assessing FPGA DSP Benchmarks at 40 nm White Paper Assessing FPGA DSP Benchmarks at 40 nm Introduction Benchmarking the performance of algorithms, devices, and programming methodologies is a well-worn topic among developers and research of

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Content Addressable Memory (CAM) Implementation and Power Analysis on FPGA. Teng Hu. B.Eng., Southwest Jiaotong University, 2008

Content Addressable Memory (CAM) Implementation and Power Analysis on FPGA. Teng Hu. B.Eng., Southwest Jiaotong University, 2008 Content Addressable Memory (CAM) Implementation and Power Analysis on FPGA by Teng Hu B.Eng., Southwest Jiaotong University, 2008 A Report Submitted in Partial Fulfillment of the Requirements for the Degree

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Global Journal of Engineering and Technology Review

Global Journal of Engineering and Technology Review Global Journal of Engineering and Technology Review Journal homepage: www.gjetr.org Global J. Eng. Tec. Review 3 (2) 30 38 (2018) Hardware and Software Implementation of Artificial Neural Network in Hybrid

More information

JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD

JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD JTAG TAP CONTROLLER PROGRAMMING USING FPGA BOARD 1 MOHAMED JEBRAN.P, 2 SHIREEN FATHIMA, 3 JYOTHI M 1,2 Assistant Professor, Department of ECE, HKBKCE, Bangalore-45. 3 Software Engineer, Imspired solutions,

More information

Configuring APEX 20K, FLEX 10K & FLEX 6000 Devices

Configuring APEX 20K, FLEX 10K & FLEX 6000 Devices Configuring APEX 20K, FLEX 10K & FLEX 6000 Devices December 1999, ver. 1.02 Application Note 116 Introduction APEX TM 20K, FLEX 10K, and FLEX 6000 devices can be configured using one of six configuration

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

Logic Optimization Techniques for Multiplexers

Logic Optimization Techniques for Multiplexers Logic Optimiation Techniques for Multiplexers Jennifer Stephenson, Applications Engineering Paul Metgen, Software Engineering Altera Corporation 1 Abstract To drive down the cost of today s highly complex

More information

Chapter 2 Parallel Hardware

Chapter 2 Parallel Hardware Chapter 2 Parallel Hardware Part I. Preliminaries Chapter 1. What Is Parallel Computing? Chapter 2. Parallel Hardware Chapter 3. Parallel Software Chapter 4. Parallel Applications Chapter 5. Supercomputers

More information

DE2 Board & Quartus II Software

DE2 Board & Quartus II Software January 23, 2015 Contact and Office Hours Teaching Assistant (TA) Sergio Contreras Office Office Hours Email SEB 3259 Tuesday & Thursday 12:30-2:00 PM Wednesday 1:30-3:30 PM contre47@nevada.unlv.edu Syllabus

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

13. Configuring Stratix & Stratix GX Devices

13. Configuring Stratix & Stratix GX Devices 13. Configuring Stratix & Stratix GX Devices S52013-2.0 Introduction You can configure Stratix TM and Stratix GX devices using one of several configuration schemes. All configuration schemes use either

More information

SpiNNaker - a million core ARM-powered neural HPC

SpiNNaker - a million core ARM-powered neural HPC The Advanced Processor Technologies Group SpiNNaker - a million core ARM-powered neural HPC Cameron Patterson cameron.patterson@cs.man.ac.uk School of Computer Science, The University of Manchester, UK

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

Introduction of the Research Based on FPGA at NICS

Introduction of the Research Based on FPGA at NICS Introduction of the Research Based on FPGA at NICS Rong Luo Nano Integrated Circuits and Systems Lab, Department of Electronic Engineering, Tsinghua University Beijing, 100084, China 1 luorong@tsinghua.edu.cn

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

Performance Evaluation of AODV and DSDV Routing Protocol in wireless sensor network Environment

Performance Evaluation of AODV and DSDV Routing Protocol in wireless sensor network Environment 2012 International Conference on Computer Networks and Communication Systems (CNCS 2012) IPCSIT vol.35(2012) (2012) IACSIT Press, Singapore Performance Evaluation of AODV and DSDV Routing Protocol in wireless

More information

FPGA Matrix Multiplier

FPGA Matrix Multiplier FPGA Matrix Multiplier In Hwan Baek Henri Samueli School of Engineering and Applied Science University of California Los Angeles Los Angeles, California Email: chris.inhwan.baek@gmail.com David Boeck Henri

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion

More information

INTRUSION DETECTION AND HIGH-SPEED PACKET CLASSIFICATION USING MEMRISTOR CROSSBARS. Thesis. Submitted to. The School of Engineering of the

INTRUSION DETECTION AND HIGH-SPEED PACKET CLASSIFICATION USING MEMRISTOR CROSSBARS. Thesis. Submitted to. The School of Engineering of the INTRUSION DETECTION AND HIGH-SPEED PACKET CLASSIFICATION USING MEMRISTOR CROSSBARS Thesis Submitted to The School of Engineering of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for

More information

PREFACE. Changes to the SOPC Edition

PREFACE. Changes to the SOPC Edition PREFACE Changes to the SOPC Edition Rapid Prototyping of Digital Systems provides an exciting and challenging laboratory component for undergraduate digital logic and computer design courses using FPGAs

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

4. Hot Socketing & Power-On Reset

4. Hot Socketing & Power-On Reset 4. Hot Socketing & Power-On Reset CII51004-3.1 Introduction Cyclone II devices offer hot socketing (also known as hot plug-in, hot insertion, or hot swap) and power sequencing support without the use of

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation. ISSN 2319-8885 Vol.03,Issue.32 October-2014, Pages:6436-6440 www.ijsetr.com Design and Modeling of Arithmetic and Logical Unit with the Platform of VLSI N. AMRUTHA BINDU 1, M. SAILAJA 2 1 Dept of ECE,

More information

A Proposal for a High Speed Multicast Switch Fabric Design

A Proposal for a High Speed Multicast Switch Fabric Design A Proposal for a High Speed Multicast Switch Fabric Design Cheng Li, R.Venkatesan and H.M.Heys Faculty of Engineering and Applied Science Memorial University of Newfoundland St. John s, NF, Canada AB X

More information

FYS Data acquisition & control. Introduction. Spring 2018 Lecture #1. Reading: RWI (Real World Instrumentation) Chapter 1.

FYS Data acquisition & control. Introduction. Spring 2018 Lecture #1. Reading: RWI (Real World Instrumentation) Chapter 1. FYS3240-4240 Data acquisition & control Introduction Spring 2018 Lecture #1 Reading: RWI (Real World Instrumentation) Chapter 1. Bekkeng 14.01.2018 Topics Instrumentation: Data acquisition and control

More information

Cover TBD. intel Quartus prime Design software

Cover TBD. intel Quartus prime Design software Cover TBD intel Quartus prime Design software Fastest Path to Your Design The Intel Quartus Prime software is revolutionary in performance and productivity for FPGA, CPLD, and SoC designs, providing a

More information

Fault Tolerant Parallel Filters Based On Bch Codes

Fault Tolerant Parallel Filters Based On Bch Codes RESEARCH ARTICLE OPEN ACCESS Fault Tolerant Parallel Filters Based On Bch Codes K.Mohana Krishna 1, Mrs.A.Maria Jossy 2 1 Student, M-TECH(VLSI Design) SRM UniversityChennai, India 2 Assistant Professor

More information

Design and Implementation of a Super Scalar DLX based Microprocessor

Design and Implementation of a Super Scalar DLX based Microprocessor Design and Implementation of a Super Scalar DLX based Microprocessor 2 DLX Architecture As mentioned above, the Kishon is based on the original DLX as studies in (Hennessy & Patterson, 1996). By: Amnon

More information

Optimize DSP Designs and Code using Fixed-Point Designer

Optimize DSP Designs and Code using Fixed-Point Designer Optimize DSP Designs and Code using Fixed-Point Designer MathWorks Korea 이웅재부장 Senior Application Engineer 2013 The MathWorks, Inc. 1 Agenda Fixed-point concepts Introducing Fixed-Point Designer Overview

More information

Homework 1 50 points. Quantitative Comparison of Packet Switching and Circuit Switching 20 points Consider the two scenarios below:

Homework 1 50 points. Quantitative Comparison of Packet Switching and Circuit Switching 20 points Consider the two scenarios below: Homework 1 50 points Quantitative Comparison of Packet Switching and Circuit Switching 20 points Consider the two scenarios below: A circuit-switching scenario in which Ncs users, each requiring a bandwidth

More information

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu

Mohsen Imani. University of California San Diego. System Energy Efficiency Lab seelab.ucsd.edu Mohsen Imani University of California San Diego Winter 2016 Technology Trend for IoT http://www.flashmemorysummit.com/english/collaterals/proceedi ngs/2014/20140807_304c_hill.pdf 2 Motivation IoT significantly

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

Developing a Data Driven System for Computational Neuroscience

Developing a Data Driven System for Computational Neuroscience Developing a Data Driven System for Computational Neuroscience Ross Snider and Yongming Zhu Montana State University, Bozeman MT 59717, USA Abstract. A data driven system implies the need to integrate

More information

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES

DIGITAL DESIGN TECHNOLOGY & TECHNIQUES DIGITAL DESIGN TECHNOLOGY & TECHNIQUES CAD for ASIC Design 1 INTEGRATED CIRCUITS (IC) An integrated circuit (IC) consists complex electronic circuitries and their interconnections. William Shockley et

More information

3 Data Storage 3.1. Foundations of Computer Science Cengage Learning

3 Data Storage 3.1. Foundations of Computer Science Cengage Learning 3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how

More information

Chapter 2. Cyclone II Architecture

Chapter 2. Cyclone II Architecture Chapter 2. Cyclone II Architecture CII51002-1.0 Functional Description Cyclone II devices contain a two-dimensional row- and column-based architecture to implement custom logic. Column and row interconnects

More information

High Performance Computing. University questions with solution

High Performance Computing. University questions with solution High Performance Computing University questions with solution Q1) Explain the basic working principle of VLIW processor. (6 marks) The following points are basic working principle of VLIW processor. The

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 1. Computer Abstractions and Technology COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology The Computer Revolution Progress in computer technology Underpinned by Moore

More information

3.1 Description of Microprocessor. 3.2 History of Microprocessor

3.1 Description of Microprocessor. 3.2 History of Microprocessor 3.0 MAIN CONTENT 3.1 Description of Microprocessor The brain or engine of the PC is the processor (sometimes called microprocessor), or central processing unit (CPU). The CPU performs the system s calculating

More information

Power Optimization in FPGA Designs

Power Optimization in FPGA Designs Mouzam Khan Altera Corporation mkhan@altera.com ABSTRACT IC designers today are facing continuous challenges in balancing design performance and power consumption. This task is becoming more critical as

More information

Design and Implementation of MP3 Player Based on FPGA Dezheng Sun

Design and Implementation of MP3 Player Based on FPGA Dezheng Sun Applied Mechanics and Materials Online: 2013-10-31 ISSN: 1662-7482, Vol. 443, pp 746-749 doi:10.4028/www.scientific.net/amm.443.746 2014 Trans Tech Publications, Switzerland Design and Implementation of

More information

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication Introduction All processors offer some form of instructions to add, subtract, and manipulate data.

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information

LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM

LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM LOW POWER FPGA IMPLEMENTATION OF REAL-TIME QRS DETECTION ALGORITHM VIJAYA.V, VAISHALI BARADWAJ, JYOTHIRANI GUGGILLA Electronics and Communications Engineering Department, Vaagdevi Engineering College,

More information

Introduction to VHDL Design on Quartus II and DE2 Board

Introduction to VHDL Design on Quartus II and DE2 Board ECP3116 Digital Computer Design Lab Experiment Duration: 3 hours Introduction to VHDL Design on Quartus II and DE2 Board Objective To learn how to create projects using Quartus II, design circuits and

More information

Problem Set 1 Solutions

Problem Set 1 Solutions CSE 260 Digital Computers: Organization and Logical Design Jon Turner Problem Set 1 Solutions 1. Give a brief definition of each of the following parts of a computer system: CPU, main memory, floating

More information

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

Computer Hardware Requirements for Real-Time Applications

Computer Hardware Requirements for Real-Time Applications Lecture (4) Computer Hardware Requirements for Real-Time Applications Prof. Kasim M. Al-Aubidy Computer Engineering Department Philadelphia University Real-Time Systems, Prof. Kasim Al-Aubidy 1 Lecture

More information

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

High Performance Applications on Reconfigurable Clusters

High Performance Applications on Reconfigurable Clusters High Performance Applications on Reconfigurable Clusters by Zahi Samir Nakad Thesis Submitted to the Faculty of Virginia Polytechnic Institute and State University In partial fulfillment of the requirements

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information