NoC Test-Chip Project: Working Document

Size: px
Start display at page:

Download "NoC Test-Chip Project: Working Document"

Transcription

1 NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance network-on-chip (NoC) architecture that we propose to connect 16 processing cores in the Test-Chip, which will be realized with a 90nm technology process. The main purpose of the Test-Chip is to test the functionality and performance of the long-distance links (LDLs) when they operate as part of a fairly complex NoC. The LDLs employ current-mode and low-swing signalling techniques to achieve low-power communication at near speed of the light transmission rate and are expected to offer a major improvement in terms of the NoC performance-per-watt. Channels made of such LDSs can have very high bandwidth (10Gbps) and span an end-to-end physical distance equal to 1cm or more. Figure 1 illustrates the proposed architecture for the Test-Chip. All 16 processors run concurrently, each executing a simple sequential program that is stored in the local RAM together with the necessary data. Occasionally the processors exchange data through the network by using a DMA mechanism. Each processor is connected to the NoC through two network interfaces (NIs), which are responsible for sending and receiving data and control messages. A data message can be segmented in a sequence of packets. A control message is usually a small message, one clock cycle long and with the same parallelism of the channel. The chip operates alternating between two main modes: configuration and execution. In configuration mode the NoC and the processors are off while program code and data are stored in the local RAMs using an external FPGA. In execution mode the chip is effectively a closed system that does not communicate with the outside world. Various communication scenarios can be tested by properly programming the processors. Essentially the testing session will consist of a sequence of three steps: 1) Uploading: the Test-Chip is in configuration mode. Programs and data are loaded onto the RAMs. After the uploading is complete, the overall content of the 16 RAMs is called starting configuration. At this point the processors receive a signal to start execution from the external FPGA. 2) Run: the Test-Chip is in execution mode. Each processor runs its program processing local data and exchanging data with other processors. Each processor notifies the external FPGA as soon as it completes the program execution and than stops and waits. After all the processors have notified their completion, the RAMs contain the final configuration and the test-chip can be switched back to configure mode. 3) Downloading: the Test-Chip is in configuration mode. The final configuration is downloaded from the RAMs and than compared against an expected final configuration to validate the correctness of the computation. II. TOPOLOGY The proposed NoC topology, which is shown in Figure 1, combines a standard packet-switched mesh with a circuit-switched fully-connected network based on LDLs. The mesh transports short data messages and control messages, while the circuit-switched network conveys large size DMA transfers or eventually high priority data. The mesh is a 16-node network while the circuit-switched network has only 4 nodes. Each network link is bidirectional. The two networks connect 16 tiles each containing a processing core. Each tile is equipped with two network interfaces (NIs). One NI connects the processor with a router of the mesh, while the other NI connects it to one of the four low-swing network nodes. Different messages are exchanged on the two network based on the software program running on the local processor. Notice that any given message exchange occurs using only one of the two networks. Each tile contains also a 5x5 router: the router has 4 bidirectional ports to relay packets while the fifth port is used by the processor to inject (eject) messages into (from) the packet-switched mesh network. Indeed, the mesh packed-switched network is composed by 16 routers, each connected to a processor. The circuit-switched network, instead, has a different architecture. It has just 4 nodes low-swing network nodes that manage the injection and ejection of traffic over the circuit-switched network. Each node is attached to 4 different processors and communicates directly with the other 3 low-swing network nodes through LDLs. The

2 Fig. 1. Proposed Architecture for the NoC Test Chip. processor layout can be divided into 4 square quadrants, each containing 4 processors that are attached to the same low-swing network node. This includes an arbitration mechanism to grant each of the 3 low-swing interfaces to one of the 4 processors. The group of 4 processors with the low-swing network node is called low-swing island. This topology is innovative. It conjugates a packet-switched network with a circuit-switched network, a match whose introduction has being lately advocated by many NoC researchers. The packet-switched network uses traditional full-swing signaling an is used for short data/control packets. The circuit-switched network offers a low-latency high-bandwidth low-power solution for the exchange of large messages. It is interesting to analyze the possibility of having a higher bandwidth on the LDLs than on the mesh links, because the penalty in terms of power dissipation when the bandwidth grows is lower. It is also interesting to perform a performance analysis to see the best way of distributing the traffic across the two networks. The LDLs are used to connect pairs of processors from different low-swing islands, because in general the number of hops is high. For processors with a distance of 1 or 2 hops the packet-switched network can be used. This leads to lower power dissipation for long distance communications.

3 Fig. 2. Micro-Architecture of the Processing Core. III. CORE ARCHITECTURE To simplify the design task, we propose that instead of designing a fully-functional processor we build a softwareconfigurable traffic generator. The capability of programming the generation of different traffic patterns will allow us to perform on-the-field analysis of the NoC correctness and performance. Hence, the main components of our simplified processor are (see Figure 2): a local memory (RAM), a Processing Unit (PU), a DMA engine, a Full-Swing Network Interface (FSNI), a Low-Swing Network Interface (LSNI). The RAM has 3 sections: 1) the code area, 2) the statistics area, 3) the data section. The RAM has one output port for the code and one I/O port for the data, so that a DMA engine can manage the memory while just one incoming communication and one outgoing communication are possible at any given time. The PU executes a program that is stored in the instruction area of the RAM. The program contains a sequence of instructions that allow us to emulate a real program running on the processor and generating data transfers over the networks. For this purpose we have defined a simple assembly code (see Section VI). The DMA engine is a functional block that is capable of reading from the RAM a portion of data, after receiving as input the base address and the size of the data transfer. The output of the memory is then demultiplexed toward one of the two network interfaces according to the specification given in the communication instruction. The FSNI is the interface toward the packet-switched mesh network. Over this network, small data messages and control messages are allowed. The small data messages are generated by the DMA, while the control messages are one-flit long and are generated directly from the PU. The LSNI is the interface toward the low-swing circuit network. It is able to forward on this network long data messages and it is also equipped with a control interface that allows the reservation of circuits when a transfer has to take place. The control messages related with a low-swing communication are carried by the packet network before the circuit transfer can happen. The processor architecture is shown in Figure 2. The PU and the DMA are able to read from the RAM the code and the data. The PU generates control messages to be sent over the packet network. Each control message corresponds to a distinct data transfer (either incoming or outgoing) that takes place over one of the two networks. The data read by the DMA engine from the RAM are forwarded to the appropriate network interface.

4 IV. LOW-SWING NETWORK NODE The node of the low-swing network is a circuit managing the traffic coming from the four tiles. It is equipped with 4 tile-interfaces (TI) and with 3 low-swing transceivers (LSTxRx). The TIs are used to receive the requests of communications from the processors and to give the grants. The low-swing network node essentially behaves as an arbiter, receiving many requests and allocating a number of resources lower than the number of entities that make the requests. When a circuit interface is reserved (either a Tx or a Rx) a grant signal is sent to the tile that will send or receive on that circuit. V. HIGH-LEVEL INTER-PROCESSOR COMMUNICATION The Test-Chip is a distributed concurrent system where 16 processors compute concurrently and exchange data via either the packet-switched network or point-to-point channels on the circuit-switched network. From a model of computation (MOC) point of view, the system is a collection of sequential processes running concurrently. Each process is specified as a program in the local memory. The program is made of an interleaved sequence of computation and communication instructions. During a communication phase two processors communicate through a rendezvous mechanism. In a two-processor rendezvous communication the sender processor writes a message into the RAM of the receiver processor. Both processors are running their own process and each of these processes contains an instruction related to this communication, respectively a RECEIVE and a SEND instruction. The two processes synchronize as they execute these instructions. Two protocols are suitable to implement the control phase of a data transfer using a rendezvous scheme: 1) The first protocol is illustrated in Figure 3. When the destination reaches the RECEIVE instruction (at time t rcv ) it sends a RECEIVE control message to the sender to signal that is ready to receive the data. When the sender reaches the SEND instruction (at time t snd ), it waits for the reception of the RECEIVE control message (unless it has already arrived and has been buffered) before sending the message over the network. 2) The second protocol is illustrated in Figure 4. Here, the sender starts the communication by emitting a SEND control message when it reaches the SEND instruction. After executing the RECEIVE instruction, the receiver waits for the incoming SEND control message (unless it has already arrived and has been buffered). The the LDL is requested to the low-swing network node and when the link is granted the RECEIVE control message is sent back. The sender starts to transfer the data just after receiving the RECEIVE control message. The first protocol is faster because only one control message is sent on the network and the time when the transfer starts (t tx ) is t tx = max(t snd,t rcv + RTT/2), where RTT is the Round Trip Time for a message. The second protocol, instead requires the exchange of two control messages and, it is slower because t tx = max(t snd + d rsv + RTT,t rcv + d rsv + RTT/2), where d rsv is the delay imposed by the arbitration of the low-swing link. The second protocol, however, has an advantage: if the RECEIVE control messages is also used to reserve resources over the network, those resources are reserved only when both processors are ready for the communication. In the first protocol the reservation would happen when the receiver is ready, which could occur much earlier than the time the communication really starts. The trade-offs between the two protocols can be exploited by using a different protocol in each of the different networks, as discussed in Sections VIII and IX. VI. ASSEMBLY LANGUAGE In order to emulate a real program execution together with its various possible communication traffic patterns we propose to implement a simple instruction set that includes the following instructions: SEND (source address, destination processor, destination address, length, network) RECEIVE (source processor, destination address, network) WHILE (number of instructions, number of iterations) NOP (number of cycles) STOP These instructions are sufficient to generate traffic patterns on both the packet-switched and the circuit-switched networks as well as to implement communications that are able to protect the memory consistency through the rendezvous mechanism. When a process reaches a SEND it has to write into another processor RAM a message. When a process reaches a RECEIVE it has to wait the reception of a message from another processor.

5 Fig. 3. First rendezvous communication protocol. The WHILE instruction allows to loop on a group of instructions executed just before the WHILE itself. The two parameters indicate the number of instructions to consider and the number of iterations. This instruction allows to compact a program, saving RAM memory, and repeat many times a traffic pattern. The NOP instruction is used to emulate the computation delay. The STOP instruction ends a program. VII. ROUTING AND FLOW CONTROL The packet-switched network uses wormhole flow control, which is based on the segmentation of a packet in a train of flits. A flit is defined as a set of bits that matches the link parallelism. The first flit contains the information used by the routers to forward the packet. All the following flits are sequentially forwarded along the same path following the first flit. If there is a resource conflict on a router port, the first flit can be blocked by another flit: in this case, a back-pressure mechanism stops the rest of the flits, which depending on the length of the packets can be stored in the router buffers and, possibly, even on the other routers along the path. A wormhole flow control may lead to deadlock because, differently from stored-and-forward mechanisms, the resources are allocated along the path in more than a single router. In order to avoid deadlock it is necessary to avoid the creation of cyclic dependencies by choosing carefully the routing algorithm. One of the simplest routing algorithm for a packet-switched mesh network is the XY dimension order routing, which routes a packet first along the X dimension until it reaches the destination column and then on the Y dimension until it reaches the final destination. The XY dimension order routing is known to be deadlock free. It is possible to introduce more efficient (but also more complex) routing algorithms and it is possible to make them deadlock free by adding a proper number of virtual channels (VCs). VCs allow to forward a packet on a port even if that port is the destination of a blocked packet, thereby improving also the network throughput. The insertion of VCs not only adds complexity in terms of buffer management and router design, but it needs also more buffering resources. For our purpose, since the mesh mainly carries control messages and small data packets, we plan to use a simple deadlock-free routing algorithm without the introduction of VCs. VIII. TRANSFER PROTOCOL OVER THE PACKET-SWITCHED MESH NETWORK As discussed in Section V there is more than one high-level communication protocol that can be used to implement the rendezvous communication scheme. For the packet-switched network we plan to use the first protocol discussed above (Figure 3) because it has the advantage to use just one control message and to have a lower latency. IX. CIRCUIT RESERVATION PROTOCOL The LDLs are used in a circuit-switched network that is suitable for large and infrequent messages, because there is a higher overhead in setting up the path. Once a point-to-point communication circuit is reserved, there is no need to implement a flow control mechanism or to segment the message in multiple packets because a dedicated

6 Fig. 4. Second rendezvous communication protocol. channel is available to transfer all the data. Besides, the longer is the message the lower is the overhead of the circuit set-up procedure. For the circuit-switched network we use the second protocol proposed in Section V (see Figure 4). With this protocol it is possible to convey the path-setup process over the RECEIVE control message. Before sending the RECEIVE message the receiver processor must request the reservation of the Rx on the low-swing link that will be used for the communication and receive the grant for that link (if it is already busy, it means that a connection over the same link is taking place, so the new communication is postponed). When the RECEIVE control message is received the sender can turn on the Tx on the low-swing link (the reservation protocol avoids to have another communication in progress on that link at the time the sender asks for it, because the Rx is already locked) and send the data. The protocol above uses two control messages, thus increasing the load over the packet-switched network and the latency of the control phase. On the other hand, with this protocol it is possible to reserve the resources for the circuit communication and the circuit communication has a latency that is negligible once it is set up, because it connects directly the two processor and no routing decisions (one or more clock cycles per router) have to be made. The discussed protocol resolves conflicts among different communications toward the same destination. Let s suppose that the processors A and C need both to send data to the same processor B and for both processors this operation is the next rendezvous point. In this case, B has the two RECEIVE instructions from the two processors in sequence, eventually with some computation between them. Let s suppose to have RECEIVE(A) and then RECEIVE(C) in the B code, but contemporary C executes the SEND(B) instruction before A does. In this situation B stores the SEND message coming from C, reaches the RECEIVE(A) instruction and eventually waits for the SEND message from A. Only after receiving this message a RECEIVE message is sent from B to A and the resources for the circuit are requested. The communication with C will take place only after B sends a RECEIVE message to C, i.e. when the RECEIVE(C) instruction is executed by B, only after the A-to-B transfer is completed. According to the scenario described above, a core is able to store all the possible SEND requests (15 in this case) possibly coming from the other cores and then it serves them sending the RECEIVE message in the same order it executes the relative rendezvous instructions. This allows each core to reserve a circuit only when both ends of the communication have reached the rendezvous point, so to avoid any possibility of deadlock. On the other side, as previously discussed, the receiver core sends the RECEIVE message only when the receiver of the LDL has been already reserved, i.e. when the grant from the low-swing network node has been received. With this constraint, race conditions are avoided between couples of cores that share the same LDL.

7 Fig. 5. Interface between LSNI and low-swing network node. X. MAIN DESIGN AREAS The design effort for the proposed NoC can be divided in four main parts: 1) LSNI and low-swing network node development and testing; 2) Router and full-swing network design with routing and flow control testing; 3) RAM, PU, DMA and FSNI design, with instruction fetch, instruction translation and data transfer testing; 4) Development of system-level simulator (based on Omnet++) for application-level testing and to support the programming of the various traffic scenarios for the Test-Chip. XI. LSNI AND LOW-SWING NETWORK NODE INTERFACE The interface between LSNI and the low-swing network node is implemented in order to allow the LSNI to request a LDL and receive the grant to use that resource, after an arbitration protocol executed into the low-swing network node and that takes into account the requests coming from the all four connected cores. A bi-directional data bus is also provided to allow the transfer of data to/from the core RAM from/to the low-swing network node. Figure 5 shows the signals of the interface. Request TX for the LSNI to request a transmission Request RX for the LSNI to request a reception Grant for the low-swing network node to grant the requested link Destination for the LSNI to indicate the addressed remote low-swing network node, i.e. the requested link Data to transfer the data in both directions (a single bus is sufficient because a core cannot receive and send data contemporary) Data valid to validate the data on the bus on the rising edge of the clock While the Data and Data valid signal frequency is 500MHz to simplify the design of the RAM controller, the other signal frequency is still 1GHz in order to speed up the reservation protocol. Figure 6 shows the requesting protocol for the reservation of a LDL. The Request (either TX or RX) signal is asserted when the core asks for a LDL and remains high until the end of the transfer. The Grant signal is asserted when the resource is available for the requesting core, and it remains high until the Request is high. XII. OMNET++ SIMULATOR We plan to model the entire system on two levels of granularity: message based, every transaction is a message containing all the data, delivered to the destination and representing the data transfer (including the control message exchange). With this level of granularity we speed up the simulation neglecting the flow control and we verify the communication pattern to be deadlockfree and we obtain at the end of the simulation the expected final configuration.

8 Fig. 6. LDL request protocol. flit based, the data messages (worms) are composed by a head and a tail flits, without considering all the flits between them. The head flit is routed and reserves the path. The tail flit follows the same path at the same speed of the head flit, being blocked when the head does not win a contention. This approach permits to keep low the computation effort to simulate the system modelling the flow control and makes possible the design exploration. All the data are stored and carried by the tail flit. The expected final configuration is the same because not affected by the flow control. For both the levels of abstraction we plan to model, beside the channels: the core, following the model in Figure 2 the router, with two modules modelling the arbitration mechanism and the routing algorithm the low-swing network node It is necessary to have full control of the memory. So, the data memory is considered as an HEX file while the program memory is considered as an ASCII file containing the assembly code for the program. A scenario of traffic is represented by the following files: 16.TXT files, each containing the program for a core, as input of the simulator 16.ASM files, each containing the translation of the relative.txt file, to load into the program memories of the real chip 16.HEX files, each containing the initial configuration of the data memory for a core, as input of both the simulator and the real chip 16.HEX files, each containing the expected final configuration of the core data memory, as output of the simulator 16.HEX files, each containing the final configuration of the core data memory, as content of the memories after the real chip activity At the end of the test of a scenario the expected final configuration and the final configuration must fully match. XIII. QUANTITATIVE DESIGN DIMENSIONING Global clock frequency: 1GHz RAM clock frequency: 500MHz Circuit-switched network line-rate: 80Gbps Flit width: 18bit (16bit for data + 2bit for control) Packet-switched network line-rate: 16Gbps Memory word: 16bit Data bus width between FSNI and low-swing network node: 160bit Data bus frequency between FSNI and low-swing network node: 500M Hz Number of interleaved RAM units: 10 The code and the data RAM are managed and loaded separately Data RAM block size: TBD Code RAM size: TBD XIV. OPEN ISSUES The data bus in the interface between LSNI and Low-swing network node is composed by 160 wires, with a clock frequency of 500MHz and a non negligible length (around 0.8mm for an 8mm side chip). Those factors drive to estimate an high power dissipation on that bus. A low-power interconnection is preferred on that interface.

9 The number of interleaved RAM blocks is set to be 10. This makes the generation of the address not really clean, because a counter mod10 is needed to access every single block. A number of blocks power of 2 can help the design of the memory controller. To do that a line rate multiple of 2Gbps is needed for the LDLs, i.e. 8Gbps instead of 10Gbps per wire. Eventually, it is possible to think about reducing the LDL bandwidth in order to simplify the RAM access. The size of each memory block and of the code RAM has to be defined based on the layout constraints. The division of the memory between data and code has to be made depending on the size of the transfers and the number of communications that should be run for a complete study of the system. XV. TIMELINE Dec 7: Modelling of the LDL and Low-swing network node RTL description Jan 8: RTL description of the entire chip for cycle-accurate simulations Jan 8: Simulation environment with some deadlock-free traffic patterns Jan 15: Automatic stochastic generator of deadlock-free traffic patterns Apr 15: Tapeout

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

The Benefits of Using Clock Gating in the Design of Networks-on-Chip

The Benefits of Using Clock Gating in the Design of Networks-on-Chip The Benefits of Using Clock Gating in the Design of Networks-on-Chip Michele Petracca, Luca P. Carloni Dept. of Computer Science, Columbia University, New York, NY 127 Abstract Networks-on-chip (NoC) are

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS A Thesis by SONALI MAHAPATRA Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Quality-of-Service for a High-Radix Switch

Quality-of-Service for a High-Radix Switch Quality-of-Service for a High-Radix Switch Nilmini Abeyratne, Supreet Jeloka, Yiping Kang, David Blaauw, Ronald G. Dreslinski, Reetuparna Das, and Trevor Mudge University of Michigan 51 st DAC 06/05/2014

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

NOC: Networks on Chip SoC Interconnection Structures

NOC: Networks on Chip SoC Interconnection Structures NOC: Networks on Chip SoC Interconnection Structures COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Parallel Computing 39 (2013) Contents lists available at SciVerse ScienceDirect. Parallel Computing

Parallel Computing 39 (2013) Contents lists available at SciVerse ScienceDirect. Parallel Computing Parallel Computing 39 (2013) 424 441 Contents lists available at SciVerse ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco A hardware/software platform for QoS bridging

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

Design of Router Architecture Based on Wormhole Switching Mode for NoC

Design of Router Architecture Based on Wormhole Switching Mode for NoC International Journal of Scientific & Engineering Research Volume 3, Issue 3, March-2012 1 Design of Router Architecture Based on Wormhole Switching Mode for NoC L.Rooban, S.Dhananjeyan Abstract - Network

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹

More information

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj

More information

1. INTRODUCTION light tree First Generation Second Generation Third Generation

1. INTRODUCTION light tree First Generation Second Generation Third Generation 1. INTRODUCTION Today, there is a general consensus that, in the near future, wide area networks (WAN)(such as, a nation wide backbone network) will be based on Wavelength Division Multiplexed (WDM) optical

More information

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Multi Core Chips No more single processor systems High computational power requirements Increasing clock frequency increases power dissipation

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

CS521 CSE IITG 11/23/2012

CS521 CSE IITG 11/23/2012 A ahu 1 Topology : who is connected to whom? Direct / Indirect : where is switching done? tatic / Dynamic : when is switching done? Circuit switching / packet switching : how are connections estalished?

More information

Conquering Memory Bandwidth Challenges in High-Performance SoCs

Conquering Memory Bandwidth Challenges in High-Performance SoCs Conquering Memory Bandwidth Challenges in High-Performance SoCs ABSTRACT High end System on Chip (SoC) architectures consist of tens of processing engines. In SoCs targeted at high performance computing

More information

Hardware Design, Synthesis, and Verification of a Multicore Communications API

Hardware Design, Synthesis, and Verification of a Multicore Communications API Hardware Design, Synthesis, and Verification of a Multicore Communications API Benjamin Meakin Ganesh Gopalakrishnan University of Utah School of Computing {meakin, ganesh}@cs.utah.edu Abstract Modern

More information

A Flexible Design of Network on Chip Router based on Handshaking Communication Mechanism

A Flexible Design of Network on Chip Router based on Handshaking Communication Mechanism A Flexible Design of Network on Chip Router based on Handshaking Communication Mechanism Seyyed Amir Asghari, Hossein Pedram and Mohammad Khademi 2 Amirkabir University of Technology 2 Shahid Beheshti

More information

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Design and Simulation of Router Using WWF Arbiter and Crossbar

Design and Simulation of Router Using WWF Arbiter and Crossbar Design and Simulation of Router Using WWF Arbiter and Crossbar M.Saravana Kumar, K.Rajasekar Electronics and Communication Engineering PSG College of Technology, Coimbatore, India Abstract - Packet scheduling

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Connection-oriented Multicasting in Wormhole-switched Networks on Chip

Connection-oriented Multicasting in Wormhole-switched Networks on Chip Connection-oriented Multicasting in Wormhole-switched Networks on Chip Zhonghai Lu, Bei Yin and Axel Jantsch Laboratory of Electronics and Computer Systems Royal Institute of Technology, Sweden fzhonghai,axelg@imit.kth.se,

More information

Evaluating Bufferless Flow Control for On-Chip Networks

Evaluating Bufferless Flow Control for On-Chip Networks Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University In a nutshell Many researchers report high buffer

More information

Switching and Forwarding Reading: Chapter 3 1/30/14 1

Switching and Forwarding Reading: Chapter 3 1/30/14 1 Switching and Forwarding Reading: Chapter 3 1/30/14 1 Switching and Forwarding Next Problem: Enable communication between hosts that are not directly connected Fundamental Problem of the Internet or any

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999

Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999 Interconnection Network Project EE482 Advanced Computer Organization May 28, 1999 Group Members: Overview Tom Fountain (fountain@cs.stanford.edu) T.J. Giuli (giuli@cs.stanford.edu) Paul Lassa (lassa@relgyro.stanford.edu)

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE. NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM. A Thesis SWAPNIL SUBHASH LOTLIKAR

DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE. NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM. A Thesis SWAPNIL SUBHASH LOTLIKAR DESIGN, IMPLEMENTATION AND EVALUATION OF A CONFIGURABLE NoC FOR AcENoCS FPGA ACCELERATED EMULATION PLATFORM A Thesis by SWAPNIL SUBHASH LOTLIKAR Submitted to the Office of Graduate Studies of Texas A&M

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

CH : 15 LOCAL AREA NETWORK OVERVIEW

CH : 15 LOCAL AREA NETWORK OVERVIEW CH : 15 LOCAL AREA NETWORK OVERVIEW P. 447 LAN (Local Area Network) A LAN consists of a shared transmission medium and a set of hardware and software for interfacing devices to the medium and regulating

More information

Fast, Accurate and Detailed NoC Simulations

Fast, Accurate and Detailed NoC Simulations Fast, Accurate and Detailed NoC Simulations Pascal T. Wolkotte and Philip K.F. Hölzenspies and Gerard J.M. Smit University of Twente, Department of EEMCS P.O. Box 217, 75 AE Enschede, The Netherlands P.T.Wolkotte@utwente.nl

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Basic Network-on-Chip (BANC) interconnection for Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Abderazek Ben Abdallah, Masahiro Sowa Graduate School of Information

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

Absolute QoS Differentiation in Optical Burst-Switched Networks

Absolute QoS Differentiation in Optical Burst-Switched Networks IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO. 9, NOVEMBER 2004 1781 Absolute QoS Differentiation in Optical Burst-Switched Networks Qiong Zhang, Student Member, IEEE, Vinod M. Vokkarane,

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering. Switch Design [ 10.3.2] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering. Here is a basic diagram of a switch. Receiver

More information

VLSI D E S. Siddhardha Pottepalem

VLSI D E S. Siddhardha Pottepalem HESIS UBMITTED IN ARTIAL ULFILLMENT OF THE EQUIREMENTS FOR THE EGREE OF M T IN VLSI D E S BY Siddhardha Pottepalem EPARTMENT OF LECTRONICS AND OMMUNICATION NGINEERING ATIONAL NSTITUTE OF ECHNOLOGY OURKELA

More information

Address InterLeaving for Low- Cost NoCs

Address InterLeaving for Low- Cost NoCs Address InterLeaving for Low- Cost NoCs Miltos D. Grammatikakis, Kyprianos Papadimitriou, Polydoros Petrakis, Marcello Coppola, and Michael Soulie Technological Educational Institute of Crete, GR STMicroelectronics,

More information

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing

RHiNET-3/SW: an 80-Gbit/s high-speed network switch for distributed parallel computing RHiNET-3/SW: an 0-Gbit/s high-speed network switch for distributed parallel computing S. Nishimura 1, T. Kudoh 2, H. Nishi 2, J. Yamamoto 2, R. Ueno 3, K. Harasawa 4, S. Fukuda 4, Y. Shikichi 4, S. Akutsu

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

UNIT- 2 Physical Layer and Overview of PL Switching

UNIT- 2 Physical Layer and Overview of PL Switching UNIT- 2 Physical Layer and Overview of PL Switching 2.1 MULTIPLEXING Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single data link. Figure

More information

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved. + William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 3 A Top-Level View of Computer Function and Interconnection

More information