Deadlock-Avoidance Technique for Fault-Tolerant 3D-OASIS-Network-on-Chip

Size: px
Start display at page:

Download "Deadlock-Avoidance Technique for Fault-Tolerant 3D-OASIS-Network-on-Chip"

Transcription

1 Deadlock-Avoidance Technique for Fault-Tolerant 3D-OASIS-Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu Graduate School of Computers Science and Engineering Aizu-Wakamatsu , Japan {d , Abstract During the last few decades, 3-dimensional Networks-on-Chips (3D-NoCs) have been proposed as a promising architecture that combines the high parallelism of Networkon-Chip interconnect paradigm with the high performance and lower interconnect power of 3-dimensional integration circuits (3D-ICs); however, 3D-NoC systems are exposed to a variety of manufacturing and design factors making them vulnerable to different faults that cause corrupted message transfer or even catastrophic system failures; therefore, a 3D-NoC system should be fault tolerant to transient malfunctions or permanent physical damages. Most of the exiting 3D-NoC systems rely on routing algorithms to ensure fault-tolerance; however, one the serious problems that may face routing algorithms is deadlock which can cause the blockage of some routers in the network or even blocking the entire system. Consequently, deadlock should be avoided or detected and removed. In this paper, we present a low cost deadlock-recovery technique for fault-tolerant 3D-NoC systems. The proposed technique detects the presence of deadlock in the network and removes it with no considerable performance drop. The proposed technique was implemented on our earlier designed 3D-NoC system, named 3D-OASIS-NoC, which adopts Look- Ahead-Fault-Tolerant routing algorithm (LAFT). Evaluation results show that... Keywords-3D NoC; Concurrent; Fault-tolerant; Routing; Deadlock; I. Introduction Based on a simple and scalable architecture platform, Network-on-Chip (NoC) [1], [2] connects processors, memories and other custom designs together using switches to distribute packets on a hop-by-hop basis to increase the bandwidth and performance and solve the interconnect bottleneck in traditional bus-based systems. At the same time, three dimensional integrated circuits (3D-ICs) [3], [4] have attracted a lot of attention as a potential solution to resolve the interconnect bottleneck. Thanks to the reduced average interconnect length, 3D-ICs can achieve higher performance and a lower interconnect power consumption can be obtained [5], [6]. Moreover, circuitry is more immune to noise with 3D-ICs chips [4], and the realization of mixed technology has become possible [7], [8]. Combining the NoC structure with the benefits of the 3D integration offers a promising 3D- NoC architecture. This combination provides a new horizon for NoC designs to satisfy the high requirements of future large scale applications. Due to the complex nature of 3D-IC fabrics and the continuing shrinkage of semiconductor components, 3D-NoC systems are becoming increasingly vulnerable to failures caused by physical defects (permanent faults) and transient faults caused by some component failures [9]. 3D-NoC systems are susceptible to many kinds of faults such as in routers, IPs, links etc. As Lehtonen et al stated in [10], the majority of failures (80%) are caused by transient faults, while the rest of them originate mainly in permanent and intermittent faults. These kinds of faults should not cause a complete system failure as a safety requirement and 3D- NoCs should be able to run and deliver correct messages to their corresponding destination nodes, even with degraded performance. This can be done by either employing a fault tolerant mechanism that avoids or deactivates the faulty components or by reconfiguring the system without causing any important performance drop. Figure 1. Deadlock example in fault-tolerant 3D-NoC system. As is the case for every adaptive routing algorithm, the deadlock problem may rise with fault-tolerant routing schemes. Deadlock is one of the major issues in NoC

2 systems which is caused when packets in different buffers are unable to progress because they are dependent on each other forming a dependency cycle. In fault-tolerant algorithms, deadlock is more often to occur due to the presence of faults which add more restrictions to the routing decision. Figure 1 illustrates a deadlock example in a fault-tolerant 3D-NoC system. The dependency is caused by the flits exchange between R 002 and R 001. Due to the presence of faults, the choices for a minimal routing is limited and both communications are dependent on each other; thus, none of them can make progress along the network. On the same figure, we can see that flits Dest010 and Dest000, stored in the input-ports of R 011 and R 001 respectively, are victims of this deadlock; i.e., even their output-channel is free, they have to wait in the buffer until the blocking is resolved. Most of the existing 3D-NoC systems [11], [12], [13], [14] used Virtual-Channel (VC) [15] as a deadlock avoidance technique. As illustrated in Fig. 2, VC divides the inputbuffer in smaller queues which are independent on each other and managed by an arbiter. When a blockage happens in one VC, the other ones are not affected and they continue asking request for their corresponding output-channel. In this fashion, nonblocked requests are served and their slots are freed to host other incoming flits. The number of VCs depends on the algorithm complexity and the deadlock probability to happen; thus, some architectures used two VCs [11], others used three VCs [12] and some others even used four VCs [13], [14] to ensure deadlock-freedom for their fault-tolerant routing algorithms. Both VC and VOQ ensure deadlock-freedom; however, the employment of such techniques is costly in terms of hardware and implementation complexity. This is caused by the arbitration needed to handle the different requests coming from the multiple VCs/VOQs at each input-port. In another work, Pasricha et al [18] extended a 2D turn model for partially adaptive routing to the third dimension. The proposed scheme combines both 4N-FIRST and 4P- FIRST schemes to propose a lightweight 4NP-FIRST. On the other hand, this turn model introduces some routing restriction to prevent from deadlock. These restrictions cause a nonminimal routing selection where in some cases it may take too many additional hops for the packet to reach its destination. Figure 3. 3D VOQ router architecture. Figure 2. 3D VC router architecture. Another technique used for deadlock-avoidance is called Virtual-Output-Queue (VOQ) [16]. In VOQ, as sown in Fig. 3, the input-buffer is divided into different queues to host incoming flits which are stored depending on their corresponding output-channel; i.e., VOQ (i,j) stores flits coming from input-port i wishing to access output-port j. For each output-channel, a 7x1 crossbar(i) is dedicated to handle the traversal of flits coming from the different input-channels and asking the grant for the output-channel(j). According to [17], VOQ can achieve less switch delay than VC with the same efficiency. Based on these facts, in this paper we propose a lowcost deadlock-recovery technique, named Random-Access- Buffer (RAB). RAB detects first the deadlock occurrence then manages to drop the blocking request and looks for other ones to free some slots in the buffer and break the dependency cycle causing the deadlock. RAB was implemented on our fault-tolerant 3D-OASIS-NoC system [19], [20], [21] that employs Look-Ahead-Fault-Tolerant routing algorithm (LAFT) [22] which boosts the performance of 3D-OASIS-NoC while simultaneously guaranteeing faulttolerance with considerably no performance degradation. The rest of the paper is organized as follows: In Section 2, 3D-OASIS-NoC system architecture is overviewed including the adopted Look-Ahead-Fault-Tolerant routing algorithm (LAFT). The proposed Random-Access-Buffer (RAB) for deadlock recovery is introduced in Section 3. Section 4 is dedicated for the evaluation methodology and results, and finally we end the paper with conclusion and future work in Section 5.

3 Figure 4. Look-Ahead-Fault-Tolerant routing algorithm flow chart. II. Fault-Tolerant 3D-OASIS-NoC System Overview A. Look-Ahead-Fault-Tolerant routing algorithm To keep the benefits of look-ahead routing [20], [23], Look-Ahead-Fault-Tolerant routing algorithm (LAFT) [22] should be able to perform the routing decision for the next node taking into consideration its link status and selects the best minimal path. Before starting to explain LAFT, there are two important assumptions that should be mentioned. First, the links connecting the PE to the local input and output ports are always nonfaulty. Second, we assume that there exists at least one minimal path between a (source, destination) pair. These assumptions are natural and necessary to deliver any flit from source to destination. We employed a simple fault detection mechanism based on a single multiplexer in each input-port that reads the incoming flit and verifies whether is corrupted or not. Depending on this verification, the fault-control module sends a single bit signal to the upstream node that can be either 0 or 1, for valid or faulty respectively. Each router sends the collected information corresponding to its own fault status to each one of the six neighboring nodes and also to the Network-Interface of the attached PE. This information is represented in a six bits signal representing the router link status in each direction (North, East, Up, South, West and Down). It is important to mention that the choice of using control signals to transfer the fault information rather than using control flits is taken to enhance the performance. Using the latter approach will increase the congestion in the router where we may find data and control flits competing for the router resources. Also, we avoided using registers to store this information, and instead used signals to decrease as much area overhead as possible that might be caused by additional registers. The fault information is read by each input-port where LAFT is executed. Figure 4 illustrates the flow chart of this algorithm. The first phase of this algorithm calculates the next node address depending on the next-port identifier read from the flit. For a given node wishing to send a flit to a given destination, there exist at most 3 possible directions through X, Y, and Z dimensions respectively. In the second phase, LAFT performs the calculation of these 3 directions by comparing x, y and z coordinates of both current and destination nodes concurrently. At the same time, as these directions are being computed, the fault-control module reads the next-port identifier from the flit and sends the appropriate fault information to the corresponding inputport. By the end of this second phase, LAFT has information about the next node fault status and also the three possible directions for a minimal routing. In the next phase, the routing selection is performed. For this decision, we adopted a set of prioritized conditions to ensure fault-tolerance and high performance either in the presence or absence of faults: 1) The selected direction should ensure a minimal path and it is given the highest priority in the routing selection. 2) We should select the direction with the largest next hope path diversity. 3) The congestion status is given the lowest priority. Depending on these priorities, LAFT reads the fault status of the next node received from the fault-control module and checks the number of possible nonfaulty minimal directions. As illustrated in Fig. 4, if only one nonfaulty minimal direction is obtained, this direction will be selected as out-port for

4 the next node. If more than one possible minimal direction is available, the algorithm selects the direction which leads to a node with higher path diversity. The diversity value for a given node is the number of possible directions leading to the destination through a minimal path. A node with high diversity results in more routing choices. This means that the probability of finding a nonfaulty link is greater when considering faults. When no faults are detected in the system, selecting the direction with the highest diversity gives more choices to find the least congested direction. As stated in [18], to obtain directions with high diversity, we should select those leading to nodes located in the center of the mesh and avoid routing to the edges of the network. to allow the routing calculation and switch allocation to be performed both in parallel. B. System architecture The 3D-OASIS-NoC system architecture [19], [20], [21] is represented in Fig.6. This figure also depicts the router block diagram and its three main pipeline stages: Buffer Writing BW, Routing calculation/switch Arbitration RC/SA and finally the Crossbar Traversal stage CT. The router is considered as the back-bone component of the whole 3D-OASIS-NoC design. Each router has a maximum number of 7-input by 7-output ports, where 4 ports are dedicated to the connection to the neighboring routers, one port is used to connect the switch to the local computation tile, and the remaining two ports are added to connect the router to the upper and downer layers to ensure the inter-layer communication. The 3D-OASIS- NoC router block diagram is shown in Fig.6. It contains seven Input-port modules for each direction in addition to the Switch-Allocator (where the STALL-GO flow control and the matrix-arbiter scheduler can be found) and the Crossbar module that handles the transfer of flits to the next neighboring node. Figure 5. Look-Ahead-Fault-Tolerant routing algorithm example.. When the three possible directions are minimal and have the same diversity, the routing selection is made depending on the congestion of each output port. This congestion information is obtained by the stop signal issued from the flow control used in our 3D-OASIS-NoC system. When there is no valid minimal route available, LAFT chooses a nonminimal route while also considering the 2nd and 3rd priorities (path diversity and congestion) as illustrated in Fig. 4. To understand better how LAFT works, we observe Fig. 5. Assuming that the current node (labeled C) received an incoming flit where the next port identifier, calculated in the previous node, indicates that the out-port for this flit is East (Red arrow). The next node address is calculated (labeled N). Three minimal directions are possible for routing: East, North or Up. The East direction will not be selected since the link in this direction is faulty. Therefore, either North or Up can be selected, which both are minimal and nonfaulty. In this case, the diversity priority is taken into consideration. If Up is selected, where the node in this direction is on one of the network edges, the diversity value is equal to 2 (2 minimal possible directions: East or North). However if North is selected, its diversity value is equal to 3 (East, North or Up). Having the highest priority, the North outport (Green arrow) is selected for the next node and it is embedded in the flit to be used in the downstream node Figure 7. Input-port module architecture. The Input-port module is represented in Fig.7. It is composed of two main elements: Input-buffer and the Route module. Incoming flits from different neighboring routers, or from the connected computation tile, are first stored in the Input-buffer and waiting to be processed. This step is considered as the first pipeline stage of the flit s life-cycle BW. Each input-buffer can host up to 4 flits. After being stored, the flit is fetched form the buffer and advances to the next pipeline stage. The destination addresses (xdest, ydest and zdest) are decoded in order to extract the information about the destination address in

5 Figure 6. 3D-OASIS-NoC system architecture. addition to the Next-Port identifier pre-calculated in the previous upstream node. These values, in addition to the fault information, are sent to the Route circuit where LAFT is executed to determine the New-next-Port direction for the next downstream node while taking into consideration its link fault status. At the same time, the Next-Port identifier is also used to generate the request for the Switch-Allocator asking for grant to use the selected output port via sw-req and port-req signals. In order to enable the bypass technique [21], [24], two signals are issued from the buffer to give information about the buffer occupancy status. These signals are fifo-empty and fifo-nearly-empty. When the fifo-empty signal is issued, that means that the input-buffer is empty, and when an incoming flit arrives to the input port it does not need to be stored in the buffer. So, the flit can overlap the buffering stage and advancing to the next stage (RC and SA). The sw-req and port req signals issued from each Inputport module, and giving information about the desired output-port, are transmitted to the Switch-Allocator module to perform the arbitration between the different requests. This process is done in parallel with the routing computation done in Input-port to form the second pipeline stage RA/SA. At the end, the Switch-Allocator sends the sw-cntrl signal that contains all the information needed by the Crossbar circuit about the scheduling result. This latter, forming the last pipeline stage CT, reads the correspondent flit from the granted Input-port and sends it to its allocated outputchannel. More details about the 3D-OASIS-Architecture can be found in [20], [21]. III. Random-Access-Buffer for Deadlock-recovery As is the case for every adaptive routing algorithm, the deadlock and livelock issues may rise. As we previously mentioned in Section 2, most of the existing routing algorithms use either virtual channels or add restrictions to the routing selection to avoid deadlock. These solutions either suffer from high implementation complexity or incur an additional delay due to the nonminimal approach. In our case, we implemented a similar technique to virtual channels, but it is much simpler and less complex. This technique, named Random Access Buffer (RAB), detects first the flit being the reason of deadlock in the buffer, drops its request and then looks for another flit whose request can be granted to free some slots in the buffer and break the dependency. Instead of manging many requests at the same time, as it is the case of virtual channels which require additional complexity and delay for the arbitration, RAB handles each request at a time. Figure 8 shows an example how RAB works. In each input-port, a buffer-controller (BC) manages the detection of deadlock and handles the assignment of head and tail addresses. The detection mechanism is based on a timer which after a period of time, if the request being processed is not served a flag is issued informing the presence of a deadlock (Figure 8 (1)). This is done by reading the grant signal received from the Switch-allocator (sw gr). In this case, the BC reads the head of the next packet in the buffer and checks whether the requested out-port is different from the one previously flagged as blocked or not. When it finds a request whose channel is free (Figure 8 (2)), it sends a request to the Switch-allocator to be served. When the request is granted, the flits of the granted packet are dequeued from the buffer and the freed slots can be used to host another incoming packet (Figure 8 (3)). After new flits are enqueued in the buffer, the blocked packet is checked again (Figure 8 (4)). The BC receives a grant for the direction requested (North) and the packet is dequeued from the buffer. Despite the delay penalty required by the timer to detect the deadlock, this technique is still faster and simpler to implement than Virtual channels. As long as the chosen route is minimal, the livelock problem does not exist either. However, it can be observed when a nonminimal direction is selected. For this reason,

6 (1) (2) Figure 8. (3) (4) Performance evaluation: (a) Stall count evaluation with(b) Latency per flit (c) Throughput. some restrictions are added when selecting the nonminimal route in addition to the one mentioned above. The first restriction forbids the flit to turn back to the same direction where it came from. The second one forbids selecting a path which is in the opposite direction of the faulty link (i.e. if East is faulty then West should not be selected). Adopting these restrictions guarantees the livelock freedom of LAFT, and the flits will continue to advance and search for a route until it finds a valid link. A. Evaluation methodology IV. Evaluation The proposed deadlock-recovery technique is implemented on 3D-OASIS-NoC system [19], [20] which was designed in Verilog HDL, synthesized and prototyped on commercial CAD tools and FPGA board [25]. We evaluate the hardware complexity of LAFT router in terms of area utilization, power consumption (static and dynamic) and speed. To evaluate the performance of the proposed algorithm, we selected Matrix-multiplication [26], [27] as a real benchmark and also two traffic patterns: Transpose [28] and Uniform [29]. We chose Matrix-multiplication because it is one of the most fundamental problems in computer sciences and mathematics, which forms the core of many important algorithms such as engineering and image processing applications [26], [27]. To evaluate 3D-OASIS- NoC system s performance with Matrix-multiplication, we set the matrix size to a 6x6. We also decided to calculate from 1 to 100 different matrices at the same time. This aims to increase the number of flits traveling the network at the same time and see the impact of congestion on the performance of the proposed system with different traffic loads. The Transpose traffic pattern is a communication method based matrix transposition. Each node sends messages to another node with the address of the reversed dimension index [28]. The Transpose workload is often used to evaluate the NoC throughput and power consumption since it creates a bottleneck due to the long communication distance exhibited between (transmitter and receiver) pairs. The Uniform traffic pattern is a standard benchmark used in on-chip and off-chip network routing studies which can be considered as the traffic model for well-balanced shared memory computations [29]. Each node sends messages to

7 other nodes with an equal probability (i.e., destination nodes are chosen randomly using a uniform probability distribution function). In our evaluation with the two traffic patterns, we set 4x4x4 as a network size where all the nodes were assigned for both transmitter and receiver nodes. Each transmitter node injects from 10 2 to 10 5 flits into the network. While on the other side, receiver nodes verify the correctness of the received flits. Using these three benchmarks, we evaluated the latency per flit and the throughput of each application. We observed the performance variation of the proposed system under different fault link rates (0%, 1%, 5%, 10%, 15% and 20%). The number of links in each system can be calculated using this formula [30]: #links = N 1 N 2 (N 3 1) + N 1 N 3 (N 2 1) + N 2 N 3 (N 1 1) (1) Where N1, N2 and N3 are the respective network s X, Y and Z dimensions. During the evaluation, we divided the faults into two categories: Half of the faults are permanent (considered during the whole simulation time) and the second half are transient (randomly start and end along the simulation time). In addition, as much as the fault rate increases, we employed more faults in flits paths to cause nonminimal routing and observe the system behavior in a worst case environment. All the results obtained with LAFT were compared with our previous proposed algorithm LAFT [22] and also Dimension Order Routing XYZ [31], [32]. Table I represents the configuration parameters used for our evaluation. Table I Simulation configuration. Parameters / System LAFT-based LAFT-based+ RAB XYZ-based JPEG 2x2x2 2x2x2 2x2x2 Network Size Matrix (3x3) 3x3x3 3x3x3 3x3x3 (Mesh) Matrix (6x6) 3x6x6 3x6x6 3x6x6 Transpose 3x3x3 3x3x3 3x3x3 JPEG 34 bits 27 bits 30 bits Flit size Matrix 38 bits 31 bits 33 bits Transpose 38 flit 31 flit 33 flit JPEG 16 bits 9 bits 12 bits Header size Matrix 16 bits 9 bits 12 bits Transpose 16 bits 9 bits 12 bits JPEG 16 bits 16 bits 16 bits Payload size Matrix 21 bits 21 bits 21 bits Transpose 21 bits 21 bits 21 bits Buffer Depth Switching Wormhole-like Wormhole-like Wormhole-like Flow control Stall-Go Stall-Go Stall-Go Scheduling Matrix-Arbiter Matrix-Arbiter Matrix-Arbiter Routing LA-XYZ XYZ RPM Target FPGA device Stratix III Stratix III Stratix III Target Structured-ASIC device HardCopy III HardCopy III HardCopy III B. Hardware complexity evaluation C. Performance evaluation 1) Communication latency evaluation: 2) Throughput Evaluation: Table II Hardware complexity comparison results. Target device System Area Static Power Speed (ALUTs) (mw) (MHz) FPGA Structured-ASIC LAFT-based LAFT-based+RAB XYZ-based Look-ahead Local Hybrid V. Conclusion References [1] F. N. Sibai. On-Chip Network for Interconnecting Thousands of Cores. IEEE Transactions on Parallel and Distributes Systems, 23(2): , February [2] A. Ben Abdallah and M. Sowa. Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization. Proceedings of The TJASSST2006 Symposium on Science, December [3] X. Wu, W. Zhao, M. Nakamoto, C. Nimmagadda, D. Lisk, S. Gu, R. Radojcic, M. Nowak and Y. Xie. Electrical Characterization for Intertier Connections and Timing Analysis for 3-D ICs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 20(1): , January [4] G. Philip, B. Christopher, and P. Ramm. Handbook of 3D Integration: Technology and Applications of 3D Integrated Circuits. Wiley-VCH, [5] Y. Xie, G. H. Loh, B. Black and K. Bernstein. Design Space Exploration for 3D Architectures. ACM Journal on Emerging Technologies in Computing Systems, 2(2):65-103, April [6] A. W. Topol, J. D. C. La Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen, A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini and M. Ieong. Three-dimensional Integrated Circuits. IBM Journal of Research and Development, 50(4/5): , July [7] X. Dong, X. Wu, G. Sun, Y. Xie, H. Li and Y. Chen. Circuit and Microarchitecture Evaluation of 3D Stacking Magnetic RAM (MRAM) as a Universal Memory Replacement. Proceedings of the 45th Annual Design Automation Conference, pages , June [8] G. Sun, X. Dong, Y. Xie, J. Li and Y. Chen. A Novel 3D Stacked MRAM Cache Architecture for CMPs. IEEE 15th International Symposium High Performance Computer Architecture, pages , February [9] L. Benini and G. De Micheli. Networks on Chips: Technology and Tools. Morgan Kauffmann, [10] T. Lehtonen, P. Liljeberg and J. Plosila. Online Reconfigurable Self-timed links for Fault Tolerant NoC. VLSI Design, (2007):1-13, 2007.

8 [11] A. -M. Rahmani, K. R. Vaddina, K. Latif, P. Liljeberg, J. Plosila and H. Tenhunen. Design and Management of Highperformance, Reliable and Thermal-aware 3D Networkson-Chip. IET Circuits, Devices & Systems, 6(5): , September [12] A. A. Chien and J. H. Kim. Planar-adaptive Routing: Lowcost Adaptive Networks for Multiprocessors. The 19th Annual International Symposium on Computer Architecture, pages , [13] J. Wu. Fault-tolerant Adaptive and Minimal Routing in Mesh-connected Multicomputer Using Extended Safety Levels. IEEE Transactions on Parallel and Distributed Systems, 11(2): , February [14] J. Wu. A Fault-tolerant Adaptive and Minimal Routing Approach in 3-D Meshes. The 7th International Conference on Parallel and Distributed Systems, pages , July [15] W. J. Dally. Virtual-channel flow control, IEEE Trans. on Parallel and Distributed Systems, 3(2): , March [16] Y. Tar and G. L. Frazier. High-performance multiqueue buffers for VLSI communication switches. 15th Annual International Symposium on Computer Architecture, pages , May-June [17] Y. Zhang and J. Hu. A DFTR Router Architecture for 3D Network on Chip. 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), pages , July 2010 [18] S. Pasricha and Y. Zou. A Low Overhead Fault Tolerant Routing Scheme for 3D Networks-on-Chip. The 12th International Symposium on Quality Electronic Design, pages 1-8, March [19] A. Ben Ahmed, A. Ben Abdallah and K. Kuroda. Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC. IEEE Proceedings of the 5th International Conference on Broadband, Wireless Computing, Communication and Applications, pages 67-73, November [23] A. Kumar, P. Kundu, A. P. Singh, L. -S. Peh and N. K. Jha. A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator in 65nm CMOS. Proceedings of the 2007 IEEE International Conference on Computer Design, pages 63-70, October [24] L. Xin and C.-S. Choy. A Low-latency NoC Router with Lookahead Bypass, IEEE International Symposium on Circuits and Systems, pages , May-June [25] [26] P. Chan, K. Dai, D. Wu, J. Rao and X Zou. The Parallel Algorithm Implementation of Matrix Multiplication Based on ESCA. IEEE ASIA Pacific Conference on Circuits and Systems, pages , December [27] A. S. Zekri and S. G. Sedukin. The General Matrix Multiply- Add Operation on 2D Torus. In the 20th IEEE International Parallel and Distributed Processing Symposium, April [28] A. A. Chien and J. H. Kim. Planar-Adaptive Routing: Low- Cost Adaptive Networks for Multiprocessors. Journal of the ACM, 42(1):91-123, January [29] A. M. Rahmani, A. A. Kusha and M. Pedram. NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-Chips Using Negative Exponential Distribution. Journal of Low Power Electronics American Scientific Publishers, 5(3): , [30] B. Feero and P. P. Pande. Performance Evaluation for Three- Dimensional Networks-on-Chip. Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages , May [31] H. Sullivan and T. R. Bashkow. Large Scale, Homogeneous, Fully Distributed Parallel Machine. Annual Symposium on Computer Architecture, ACM Press, pages , March [32] C. H. Chao, K. Y. Jheng, H. Y. Wang, J. C. Wu and A. -Y. Wu. Traffic and Thermal-aware Run-time Thermal Management Scheme for 3D NoC Systems. In Proceedings of the ACM/IEEE International Symposium on Networks-on- Chip (NoCS), pages , May [20] A. Ben Ahmed and A. Ben Abdallah. LA-XYZ: Low Latency, High Throughput Look-Ahead Routing Algorithm for 3D Network-on-Chip (3D-NoC) Architecture. The 6th IEEE International Symposium on Embedded Multicore SoCs, pages , September [21] A. Ben Ahmed and A. Ben Abdallah. Low-overhead Routing Algorithm for 3D Network-on-Chip. IEEE Proceedings of The Third International Conference on Networking and Computing, December [22] A. Ben Ahmed and A. Ben Abdallah. Architecture and Design of High-throughput, Low-latency, and Fault-Tolerant Routing Algorithm for 3D-Network-on-Chip (3D-NoC). To be published in the Journal of Supercomputing, DOI: /s

Low-overhead Routing Algorithm for 3D Network-on-Chip

Low-overhead Routing Algorithm for 3D Network-on-Chip 2012 Third International Conference on Networking and Computing Low-overhead Routing Algorithm for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu Graduate School of Computers

More information

Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC

Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC Architecture and Design of Efficient 3D Network-on-Chip (3D NoC) for Custom Multicore SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu Graduate School of Computers Science

More information

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC BWCCA 2010 Fukuoka, Japan November 4-6 2010 Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Design of an Efficient Communication Protocol for 3d Interconnection Network

Design of an Efficient Communication Protocol for 3d Interconnection Network Available online at: http://www.ijmtst.com/vol3issue10.html International Journal for Modern Trends in Science and Technology ISSN: 2455-3778 :: Volume: 03, Issue No: 10, October 2017 Design of an Efficient

More information

Efficient Multicast Communication using 3d Router

Efficient Multicast Communication using 3d Router International Journal of Emerging Engineering Research and Technology Volume 3, Issue 12, December 2015, PP 38-49 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Efficient Multicast Communication using

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Power and Area Efficient NOC Router Through Utilization of Idle Buffers Mr. Kamalkumar S. Kashyap 1, Prof. Bharati B. Sayankar 2, Dr. Pankaj Agrawal 3 1 Department of Electronics Engineering, GRRCE Nagpur

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 34 (2014 ) 552 558 2014 International Workshop on the Design and Performance of Network on Chip (DPNoC 2014) Packet-based

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Temperature and Traffic Information Sharing Network in 3D NoC

Temperature and Traffic Information Sharing Network in 3D NoC , October 2-23, 205, San Francisco, USA Temperature and Traffic Information Sharing Network in 3D NoC Mingxing Li, Ning Wu, Gaizhen Yan and Lei Zhou Abstract Monitoring Network on Chip (NoC) status, such

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

A Novel Topology-Independent Router Architecture to Enhance Reliability and Performance of Networks-on-Chip

A Novel Topology-Independent Router Architecture to Enhance Reliability and Performance of Networks-on-Chip A Novel Topology-Independent Router Architecture to Enhance Reliability and Performance of Networks-on-Chip Khalid Latif 1,2, Amir-Mohammad Rahmani 1,2, Ethiopia Nigussie 1, Hannu Tenhunen 1,2 1 Department

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration Hamed S. Kia, and Cristinel Ababei Department of Electrical and Computer Engineering North Dakota State University

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

Design and Analysis of On-Chip Router for Network On Chip

Design and Analysis of On-Chip Router for Network On Chip Design and Analysis of On-Chip Router for Network On Chip Ms. A.S. Kale #1 M.Tech IInd yr, Electronics Department, Bapurao Deshmukh college of engineering, Wardha M. S.India Prof. M.A.Gaikwad #2 Professor,

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Highly Resilient Minimal Path Routing Algorithm for Fault Tolerant Network-on-Chips

Highly Resilient Minimal Path Routing Algorithm for Fault Tolerant Network-on-Chips Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 3406 3410 Advanced in Control Engineering and Information Science Highly Resilient Minimal Path Routing Algorithm for Fault Tolerant

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip 2010 25th International Symposium on Defect and Fault Tolerance in VLSI Systems A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip Min-Ju Chan and Chun-Lung Hsu Department of Electrical

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh 98 Int. J. High Performance Systems Architecture, Vol. 1, No. 2, 27 Design of a router for network-on-chip Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh Department of Electrical Engineering and Computer

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Basic Network-on-Chip (BANC) interconnection for Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Abderazek Ben Abdallah, Masahiro Sowa Graduate School of Information

More information

Low Cost Network on Chip Router Design for Torus Topology

Low Cost Network on Chip Router Design for Torus Topology IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.5, May 2017 287 Low Cost Network on Chip Router Design for Torus Topology Bouraoui Chemli and Abdelkrim Zitouni Electronics

More information

ISSN Vol.03,Issue.06, August-2015, Pages:

ISSN Vol.03,Issue.06, August-2015, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.03,Issue.06, August-2015, Pages:0920-0924 Performance and Evaluation of Loopback Virtual Channel Router with Heterogeneous Router for On Chip Network M. VINAY KRISHNA

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS Chandrika D.N 1, Nirmala. L 2 1 M.Tech Scholar, 2 Sr. Asst. Prof, Department of electronics and communication engineering, REVA Institute

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

ES1 An Introduction to On-chip Networks

ES1 An Introduction to On-chip Networks December 17th, 2015 ES1 An Introduction to On-chip Networks Davide Zoni PhD mail: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Sources Main Reference Book (for the examination) Designing Network-on-Chip

More information

NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential Distribution

NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential Distribution To appear in Int l Journal of Low Power Electronics, American Scientific Publishers, 2009 NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC International Journal of Engineering and Manufacturing Science. ISSN 2249-3115 Volume 8, Number 1 (2018) pp. 65-76 Research India Publications http://www.ripublication.com DESIGN AND IMPLEMENTATION ARCHITECTURE

More information

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE By HAIBO ZHU A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN

More information

Tackling Permanent Faults in the Network-on-Chip Router Pipeline

Tackling Permanent Faults in the Network-on-Chip Router Pipeline 2013 25th International Symposium on Computer Architecture and High Performance Computing Tackling Permanent Faults in the Network-on-Chip Router Pipeline Pavan Poluri Department of Electrical and Computer

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Partitioning Methods for Multicast in Bufferless 3D Network on Chip

Partitioning Methods for Multicast in Bufferless 3D Network on Chip Partitioning Methods for Multicast in Bufferless 3D Network on Chip Chaoyun Yao 1(B), Chaochao Feng 1, Minxuan Zhang 1, Wei Guo 1, Shouzhong Zhu 1, and Shaojun Wei 2 1 College of Computer, National University

More information

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP Rehan Maroofi, 1 V. N. Nitnaware, 2 and Dr. S. S. Limaye 3 1 Department of Electronics, Ramdeobaba Kamla Nehru College of Engg, Nagpur,

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer

Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, Dennis Abts Sr. Principal Engineer Challenges for Future Interconnection Networks Hot Interconnects Panel August 24, 2006 Sr. Principal Engineer Panel Questions How do we build scalable networks that balance power, reliability and performance

More information

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS 1 SARAVANAN.K, 2 R.M.SURESH 1 Asst.Professor,Department of Information Technology, Velammal Engineering College, Chennai, Tamilnadu,

More information

Prediction Router: Yet another low-latency on-chip router architecture

Prediction Router: Yet another low-latency on-chip router architecture Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)

More information

Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs

Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs MATEC Web of Conferences 22, 01028 ( 2015) DOI: 10.1051/ matecconf/ 20152201028 C Owned by the authors, published by EDP Sciences, 2015 Two Multicasting Schemes for Irregular 3D Mesh-based Bufferless NoCs

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

WITH THE CONTINUED advance of Moore s law, ever

WITH THE CONTINUED advance of Moore s law, ever IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011 1663 Asynchronous Bypass Channels for Multi-Synchronous NoCs: A Router Microarchitecture, Topology,

More information

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Connection-oriented Multicasting in Wormhole-switched Networks on Chip Connection-oriented Multicasting in Wormhole-switched Networks on Chip Zhonghai Lu, Bei Yin and Axel Jantsch Laboratory of Electronics and Computer Systems Royal Institute of Technology, Sweden fzhonghai,axelg@imit.kth.se,

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

Efficient And Advance Routing Logic For Network On Chip

Efficient And Advance Routing Logic For Network On Chip RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information