Design of High-Radix Clos Network-on-Chip

Size: px
Start display at page:

Download "Design of High-Radix Clos Network-on-Chip"

Transcription

1 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip Design of High-Radix Clos Network-on-Chip Yu-Hsiang Kao, Najla Alfaraj, Ming Yang, and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute of New York University Brooklyn, NY, USA Abstract Many high-radix Network-on-Chip (NOC) topologies have been proposed to improve network performance with an ever-growing number of processing elements (PEs) on a chip. We believe Clos Network-on-Chip (CNOC) is the most promising with its low average hop counts and good load-balancing characteristics. In this paper, we propose (1) a high-radix router architecture with Virtual Output Queue (VOQ) buffer structure and Packet Mode Dual Round-Robin Matching (PDRRM) scheduling algorithm to achieve high speed and high throughput in CNOC, (2) a heuristic floor-planning algorithm to minimize the power consumption caused by the long wires. Experimental results show that the throughput of a 64-node 3-stage CNOC under uniform traffic increases from 62% to 78% by replacing the baseline routers with PDRRM VOQ routers. We also compared CNOC with other NOC topologies, and found that using the new design techniques, CNOC has the highest throughput, lowest zero-load latency, and best power efficiency. Keywords - Network on Chip; Chip Multiprocessor; Clos network; High radix NOC I. INTRODUCTION Chip multiprocessors (CMP) have become favored over traditional superscalar processors for efficiently exploiting single-chip computational potential. One major factor motivating CMP development is that computation speed can be increased with only a modest increase in power. CMP systems consist of regular processing elements (PE) and memory modules. In a CMP system, the chip area is normally divided into a number of tiles, each containing a PE or a memory module. Tiles are interconnected through a Network-on-Chip (NOC), which has a great influence on CMP performance. The performance of an NOC is determined by many factors, including topologies, routing algorithms, and flow control mechanisms. Topology determines the capacity of an NOC. The capacity is defined as the best possible throughput, assuming perfect routing and flow control, that could be achieved by the network under the given traffic pattern [20]. After the topology is decided, the routing algorithm and flow control are developed to arrange the management of physical resources in an efficient way to approach the performance bound. Many NOC topologies proposed to date consist of regular interconnection structures and low-radix routers, such as 2-D Mesh and 2-D Torus for ease of implementation. However, as more tiles or network nodes are put on the same chip, latency is increased due to the rapidly growing queueing delay in each router along with a large network diameter. Also, due to the nature of the network configuration, the capacity of a low-radix network is usually low, causing low throughput and low power efficiency. A high-radix Clos network [1] provides much better scalability than a low-radix network in terms of zero-load latency and throughput. A 3- or 5-stage Clos network can easily accommodate several hundred nodes with a reasonable router radix. The number of hops a packet traverses in the Clos network is limited to three or five. Thus, a high-radix Clos Network-on-Chip (CNOC) provides smaller zero-load latency as compared to a low-radix NOC. Another advantage of CNOC is its good load-balancing nature, from the multiple paths available between any pair of PEs. One major concern for the Clos network is its large number of long interconnects, which may lead to an increased routing area and power dissipation. With Routing over logic layout style [2], the area overhead caused by the long wires can be eliminated. One method to overcome the energy problem caused by the long interconnects is to minimize the average power consumption under uniform traffic by carefully designing the floor plan of an NOC. In other words, given an arbitrary NOC topology, by placing the routers in a proper way the average power consumption under uniform traffic on the long wires can be significantly reduced. In this paper, we propose: (1) the design of the high-radix CNOC with Virtual Output Queue (VOQ) buffer structure and Packet Mode Dual Round-Robin Matching (PDRRM) scheduling algorithm in the routers to achieve high speed and high throughput, (2) a heuristic floor-planning algorithm to minimize the power consumption caused by the long wires. Based on the design, we describe a study comparing a 64-node CNOC composed of radix-8 routers to other topologies with the same router radix upper bound, including 2-D Mesh, Fat Tree [3], and Concentrated MeshX2 [4]. We show that with the PDRRM VOQ router, CNOC has the smallest zero-load latency, highest throughput, and best power efficiency compared to other high-radix topologies and the 2D Mesh network. II. RELATED WORK Low-radix NOCs, such as 2-D mesh or torus network, have the advantage of modest design complexity with a regular interconnection structure and short wires. However, these networks suffer several disadvantages, including large network diameter and energy inefficiencies due to higher hop counts. To overcome this issue, several high-radix NOC topologies have been proposed during the past few years. Balfour and Dally [4] proposed Concentrated Mesh (CMesh) with express channels, which adapts the 2-D Mesh by allowing several tiles connecting to the same router, and adding express channels on the edge of the Mesh. Kim et al. [5] proposed Flattened /10 $ IEEE DOI /NOCS

2 Butterfly, which has a regular floor plan as CMesh, with an even higher router radix to achieve lower zero-load latency and better throughput. Some studies have been done to evaluate Fat Tree [6], which could be configured as a high-radix network as well. Clos network topology has been considered for NOC design in the SPIN [7] and Reduced Unidirectional Fat Tree (RUFT) [8] networks. In the SPIN network, the local traffic between PEs connected to the same router does not need to travel more than one hop. While in the 3-stage CNOC, any packet needs to travel exactly three hops to reach its destination port. [8] showed that by replacing Fat Tree with RUFT, which does not allow local traffic, hardware complexity and power consumption are significantly reduced. However, [8] did not address the high-radix Clos network configuration and implementation issues. Both [7] and [8] did not provide latency and power comparisons between Clos network and other NOC topologies. III. CNOC ARCHITECTURE Figure 1. shows the configuration of 64-node, 3-stage CNOC that we adopted in this paper. The switch modules (SMs) in the first, second, and third stages are denoted as input modules (IMs), center modules (CMs), and output modules (OMs). In this configuration, there are 24 routers, 8 for each stage, with the network capacity at 100% under uniform traffic. One thing to notice is that all the links shown in the figure are uni-directional, rather than bi-directional. Hence, in CNOC every flit has to travel exactly three hops from source to destination. In the Fat Tree network, the localized traffic does not need to travel all the way to the root router, because the links between routers and PEs are bi-directional. Figure 2. shows the configuration of a 64-node Fat Tree, in which there are 32 radix-8 routers and 16 radix-4 routers. The two localized traffic streams are illustrated by red lines. Compared to the 3-stage Clos network configuration, Fat Tree network requires more routers and has a higher wiring complexity. As a result, Fat Tree has much lower power efficiency compared to CNOC, which will be shown in section VI. OMs CMs IMs c1 c2 c3 c4 c5 c6 c7 c8 b1 b2 b3 b4 b5 b6 b7 b8 a1 a2 a3 a4 a5 a6 a7 a8 Input ports of the PEs Output ports of the PEs Figure 1. Configuration of a 64-node 3-stage CNOC. All links are unidirectional, and packets can only travel upward. All routers are of radix-8. Figure 2. Configuration of a 64-node Fat Tree. All links are bi-directional. Localized traffics are shown as the red lines. In CNOC, each router is implemented as an input-queued crossbar switch. CNOC applies wormhole switching, in which packets are divided into a number of fixed-length flits. Also a credit-based flow control mechanism is applied in which on two sides of a link, the upstream entity has to keep track of the available buffer space in the downstream entity to prevent buffer overflow. The architecture of the SM can be the same as the baseline Virtual Channel (VC) router described in [14], while in this paper we propose a VOQ-based NOC router to increase throughput and reduce average latency. To route a packet in CNOC with the destination address decided, there are multiple choices of different paths, each corresponding to a CM. In this paper, we forward the packets in a round-robin manner for the ease of implementation and its good load-balancing feature. IV. PDRRM VOQ ROUTER DESIGN The input buffer structure and switching allocation scheme are two main factors that affect NOC router performance. To resolve the well-known Head of Line (HOL) blocking problem caused by single FIFO structure in the input-queued routers, [13] first proposed the concept of VC, which improves performance by decoupling the physical channels from the buffer resource. Today, the VC structure has become the dominant input buffer structure in NOC for its simplicity and efficiency in improving the throughput and avoiding deadlock. The choice of Switch Allocation (SA) schemes in a VC router is the other factor that greatly affect the router throughput. Different allocation schemes, such as inputfirst/output-first separable allocator, wavefront allocator, are proposed to increase the matching quality in a VC router. However, without any speedup in the switch or multiple iterations in SA, none of these methods can give satisfying throughput with the conventional single crossbar, input-queued VC router design. According to our simulation results, the throughput of an 8 8 router with 8 VCs can only achieve 65% throughput, with input-first separable allocator and roundrobin arbiters. There are sophisticated designs proposed in the past few years that aim at high throughput router design [15][14][16]. These methods require a lot more hardware resources compared to the canonical VC based, input queued, single crossbar router design with separable allocators. In a high-radix router the critical path usually resides in the allocator, and the input-first/output-first separable allocators provide shorter critical path compared to other allocation schemes. This is why [15] chose input-first separable as their allocation scheme. In this paper, we try to find a solution for the high-radix highthroughput router design that maintains the simplicity of the input-first separable allocation scheme for the configuration of a Clos network. A. VOQ Buffer Structure The concept of VOQ has been applied to lossless networks to alleviate HOL blocking problem. In the NOC environment with wormhole switching, the VOQ structure can be regarded as a special case of the VC structure. In a VC router, an input VC can store packets that target on any output port. In a VOQ 182

3 router, each VC has an associated output port, and can only store packets destined to that output port. So VOQs are required by each input port in an switch. A phenomenon, which we termed tail of line (TOL) blocking effect, hinders the performance of VOQ routers if we connect the VOQ routers to form a multi-stage network. This problem appears with packet lengths larger than one flit. NOC routers apply wormhole routing, which forces flits of the same packet concatenated as a worm in any logical queue of the network. In a VOQ router, from an output port s perspective, there is one VOQ in each input port that targets at this output port. If we have an switch, for each output port there are corresponding VOQs, each in an input port, that could potentially make a request to send a flit to this output port. In other words, there could be at most packets targeting the same output port at the same time. If there are two packets destined for the same output port, and both of them aim at the same VOQ in the downstream input port, the flits of the two packets cannot interleave on the output physical channel of the current router. Therefore, one of the packets has to wait until the other packet finishes transmitting before it can send out its first flit. We termed this phenomenon TOL blocking because the winning packet s tail flit blocks other packets with the same next hop VOQ. TOL blocking reduces the throughput of a multi-stage network for two reasons. First, the packets which are blocked by the tail flits of other packets will cause HOL blocking to the successive packets in their logical queues. Second, if a packet is broken for some reason, like back pressure caused by credit based flow control, and spreads in several routers, those packets which are blocked by the broken packet s tail flit cannot utilize the unused physical channels. Figure 3. shows an example of TOL blocking. There are two input/output ports in each router, and two VOQs in each input port. If there are two packets in the upstream router, both in VOQ2 but different input ports, targeting VOQ1 in the downstream router2, one of them will block the other until the tail flit of the winning packet enters the downstream router. In this example packet2 has to wait until the tail flit of packet1 enters the downstream router. If the routers employ roundrobin arbiters to form separable allocators, packet3 and packet1 will receive grants alternately. When packet3 receive grants, flits of packet2 cannot take advantage of the idle output physical link since it is blocked by the tail flit of packet1. VOQ1 VOQ2 VOQ1 VOQ2 Upstream Router B2 B3 T1 B2 B3 B1 H2 : input arbiter : output arbiter Downstream Router1 B3 B3 B1 B1 B1 Downstream Router2 Figure 3. TOL blocking example. B. PDRRM Algorithm To alleviate the TOL blocking involved in the multi-stage network where VOQ routers are used, we propose a scheduling algorithm, called PDRRM. Besides the property of alleviating TOL blocking, we hope PDRRM also has the properties of sustaining high speed operation and providing good matching quality with one iteration of arbitration in each clock cycle. To meet the above requirements, we design the algorithm based on Dual Round-Robin Matching (DRRM) [17], which is similar to the input-first separable allocation scheme in VC routers for its shorter critical path. Also we take the concept of SA+VA from [14] for reducing the pipeline stages of the router and illuminating the need for Virtual channel Allocation (VA). In the SA+VA scheme, VA and SA are done in the same clock cycle, but VCs are assigned non-speculatively after SA. VA simply involves finding a free VC for each output port from a pool of free VCs every clock cycle and assigning it to the flits that win SA. To alleviate the TOL blocking problem, we change the original DRRM algorithm into packet mode, meaning that the pointers of the input arbiters (s) and output arbiters (s) in a separable allocator are updated after a packet has been transferred to the next hop, unless there is no buffer space in the input port of downstream router or the flits in the winning VOQ drain out. In the description of the algorithm, the flit types are abbreviated as follows: Single flit (S), Head flit (H), Body flit (B), and Tail flit (T). A VOQ is called an eligible if it contains a packet that is destined for an available downstream VOQ with free buffer space. Step 1: Request. Each sends an output request corresponding to the first eligible nonempty VOQ in a fixed round-robin order, starting from the current position of the pointer in Step 2. The pointer of the is advanced by one location beyond its current position if (1) its request is not granted in Step 2, or (2) the request is granted but after one flit is served, this VOQ becomes empty, or (3) the granted flit is a T or S. Otherwise, the pointer remains unchanged. Step 2: Grant. If an receives one or more requests, it chooses the one that appears next in a fixed round-robin schedule starting from the current position of the pointer. The output notifies each requesting input whether or not its request was granted. The pointer of the is incremented to one location beyond the granted input if an S or T is granted. The pointer of the remains at the granted input if an H or B is granted. The pointer of the remains unchanged if there is no request. VOQ PDRRM routers achieve high performance in multistage network for three reasons. First, VOQ eradicates the HOL blocking problem in an input-queued switch. On the other hand, VC can only reduce the frequency of the occurrence of HOL blocking. Second, by applying the packet mode switching concept the TOL blocking problem is effectively alleviated, since the scheduler tends not to interleave flits of different packets on the output physical links. Third, packet mode scheduling allows the existing matching between input/output pairs to last for a longer time compared to flit mode scheduling, and the unmatched input ports can gradually augment the 183

4 matching pattern by sending requests to the available output ports. This is somewhat similar to the concept of multiple iterations employed in DRRM and islip [18]. C. PDRRM VOQ Router Pipeline Figure 4. shows our proposed VOQ router pipeline, which consists of 3 stages: (1) Switch+VOQ Allocation (SA+VA), Buffer Write (BW), (2) Buffer Read (BR), (3) Switch Traversal (ST). In CNOC Routing computation (RC) is done in the PEs and the RC information for each packet is attached in the head flits. Hence there is no RC or next hop routing computation (NRC) in the pipeline stages. Figure 4. The proposed PDRRM VOQ router pipeline. D. Post-layout Simulation Result Methodology: We designed a VOQ router with the following configuration: 8 input ports, 8 output ports, and the shared memory size varies among 64, 128, and 256 flits. The VHDL code of the router design is synthesized and analyzed by the Cadence Encounter RTL Compiler on the STMicroelectronics Company 65nm technology. Then we use SOC Encounter to do automatic place and route. Power evaluation is performed based on an activity factor of 10% for each sub-block. For memory read/write operations, we use the popular memory model CACTI 5.3 from HP Labs to characterize the memory delay, and area/power consumption. TABLE I. Sub-Block Input Buffer (64-flit) Input Buffer (128-flit) Input Buffer (256-flit) 8X8 VOQ ROUTER PERFORMANCE Critical Path Delay (ps) Area (um 2 ) Power 1GHz Allocator Crossbar TABLE I. summarizes the simulation results (delay, area and power) for the proposed 8x8 VOQ router. It can be seen that the input buffer and the allocator have comparable critical path delays and the power consumption is mainly determined by these two sub-blocks. From the simulation results we know that the PDRRM router can achieve 1 GHz. V. CNOC FLOOR PLAN One common doubt about CNOC is the considerable number of long global interconnects on chip. With the shrinkage of process dimension, the delay and power consumption issues of global interconnects become more critical. To deal with the delay and power issues of long interconnects on CMPs, there are mainly two orthogonal directions. One direction is to reduce the delay and power consumption on long wires. Examples are inserting repeaters and buffers on long wires, using low swing technique, encoding the transmitted data to reduce hazardous signal transitions, or introducing new technologies such as optical and radio frequency signaling. The other direction is to carefully design the floor plan of the NOC so that the latency on the longest wire does not exceed a certain value, and the average power consumption under uniform traffic is minimized. In this paper, we use conventional global wires with repeaters, and assume that on each link the latency for one flit is always one clock cycle. This is to illustrate the capability of CNOC in the current technology. What we propose in this section is a heuristic floor planning algorithm for CNOC that can minimize the average power consumption under uniform traffic, and at the same time limit the maximum wire length so that the latency on long wires does not exceed the target value. A. Routing over resources or routing on dedicated channels In [2] two kinds of layout styles of on-chip global interconnects are identified. The first one is called routing over resources and the second one is called routing on dedicated channels as illustrated in Figure 5.. In routing over resources, several metal layers, usually two, are reserved for interconnects between the routers and tiles. In this way, the area overhead of the NOC is completely contributed by the area of the routers. In routing on dedicated channels, no specific metal layers are reserved, but spaces on the tile edges are reserved for NOC wires. The space between two tiles is usually equal to the dimension of a router, for the reason that an intersection of tile edges usually holds one router. In this paper, we adopt the layout methodology of routing over logic, for its elasticity in global wiring. Also, we assume that only one router can be put in an intersection of tile edges. Figure 5. Two different NOC layout styles. B. Wiring density In evaluating an NOC floor plan, despite its power characteristics, we also need to consider its feasibility in terms of wiring density. According to [9], wiring density is the maximum number of tile-to-tile wires routable across a tile edge. In the scenario of routing over logic, this means that there are some parts of the tiles that are not routable for global interconnects, like the regions occupied by core logics and first level caches of the PEs. In [9], it is assumed that 40-60% of the length of a tile edge is routable for global interconnects. In this paper, we assume the dimensions of the tiles are 1.5mm by 1.5mm. Hence, the total wire width on a tile edge cannot exceed mm. In this paper, we assume the worst case, so only 0.6mm of a tile edge is routable for CNOC. Also, we assume the wire width to be 400nm for the global interconnect. Hence, each tile edge can accommodate 1500 wires. In other words, each tile edge can accommodate 1500/W links for the NOC, where W is the bus width. 184

5 C. Problem Definition We are given a set of identical tiles, a set of routers, 1,2,3, and a set of links, 1,2,3 that connect and to form a 3-stage Clos network. The positions of the tiles are fixed, while the positions of the routers are variable. A router can be located in any intersection of the tile edges. Once an intersection has been taken, it is not available to other routers. Before describing the objective function, we first define as the Manhattan distance between the two nodes that connects, and is the energy that a flit consumes when it travels through a link with Manhattan distance. The Manhattan distance between a router and a tile is defined as the Manhattan distance between the two closest points of the tile and router. The floor planning problem is to determine the location of each router, so that the objective function Φ, is minimized, the maximum wire length is within the boundary to achieve the operating frequency without pipelining, and at the same time the wiring density constraint is met. The physical meaning of Φ, is the sum of energy consumption for all links to transfer one flit, given a placement of the routers. In CNOC, under uniform traffic load, every link has the same probability to transfer a flit if round-robin or random-routing algorithms are applied. Hence Φ, can be regarded as an indicator of total power consumption of CNOC under uniform traffic. Figure 6. shows an example for placing two routers on a chip with 3 2 tiles. The Manhattan distance between router A and B is 2. The Manhattan distance between router A and tile k is 3, calculating from A to the upper left corner of tile k. Figure Tiles with two routers. The Manhattan distance between router A and tile C is defined as 3. D. Complexity of the problem To solve the objective function, the size of the solution space is. To the best of our knowledge only Ye and Micheli [10] have tried to tackle this problem in the tiled CMP NOC context. What they did was to solve a similar objective function to Φ,, and obtain a real valued solution. Then they map the real valued position into integer coordinates, which is the final floor plan, by sorting the real valued solution. However, in [10] the wiring density was not considered, which may cause the only solution to be infeasible. Also the objective function only considers the total wire length, instead of the total power consumption on the wires. For a 3-stage CNOC with 24 routers on a 64-tile CMP, there are possible floor plans for comparison if the exhaustive search algorithm is applied. is about 6 10, which requires a lot of computational power for comparison. What we propose in this paper is a heuristic method that can reduce the solution space to /, which is 8008 in a 64-tile / CMP, so that the solution can be easily obtained from few iterations with exhaustive search. E. Algorithm As stated in [10], a good floor-planning algorithm should meet the hierarchy constraint. The hierarchy constraint requires clusters of the tiles and routers in the network topology to stay together in the layout. If we apply hierarchy constraints to CNOC, an intuitive idea for the floor planning would be to gather together the PEs or memory modules that are connected with the same IM/OM. In other words, the tiles of CMP should be grouped spatially, and each group of tiles should be connected by the same IM and OM. By following the hierarchy philosophy, we propose a three-step heuristic floor-planning algorithm. The three steps are tile mapping, router placement, and exhaustive search. 1) Tile mapping: We first divide the chip into four quadrants. Each quadrant contains N/4 groups of tiles, and each group contains N tiles, forming a rectangle, which is associated with a specific IM and OM. Figure 7. illustrates how we divide the 64-tile chip. In the first quadrant, there are 16 tiles. The tiles marked with 1 are associated with IM1 and OM1, and tiles marked with 2 are associated with IM2 and OM2. For the fourth quadrant, we place group 3 and 4 tiles in a similar fashion, and we rotate the first quadrant clockwise 90 degrees for symmetry. The remaining quadrants follow the same logic. 2) Router placement: Given the tile mapping from step 1, IMs and OMs that are associated with tiles in the first quadrant, and other N/4 arbitrary CMs must be located in the first quadrant. The same logic applies to other quadrants. In other words, each quadrant contains N/4 arbitrary CMs, since every CM has to connect to all the IMs and OMs, and IMs and OMs are evenly distributed among the four quadrants. Therefore, in each quadrant there are 3 /4 routers /4 IMs, /4 OMs, and /4 CMs. We further require that the placement of routers on each quadrant be symmetric. For example, in Figure 7. routers o1, o3, o5, and o7 are in the symmetric position on different quadrants. 3) Exhaustive search: The remaining part of the algorithm is straightforward. For each quadrant, there are /4 available positions for the /4 routers. Here, we exclude the peripheral of the chip. We examine each each possible floor plan for its, and feasibility in terms of wiring density and maximum wire length. Finally, we obtain the preferred floor plan by sorting the, values. F. Floor plan for the 64-node CNOC Figure 7. shows the result of our floor-planning algorithm based on the wire characteristics listed in TABLE III., under the constraint that the longest wire length is 7 hops, 40% length of the tile edge is routable, any link can contain up to128 bits, and all the wires can be operated under 1 GHz. In this floor plan, all the CMs are located around the central point of the chip, surrounded by other IMs and OMs. In section VI, we evaluate the power performance of CNOC based on this floor 185

6 plan. One thing to notice is that there could be multiple solutions for this problem. For example in Figure 7. router o1, o3, o5, o7 may be moved to the central points of the their own quadrants without affecting the power performance and still under the constraint of wiring density o o8 i8 i o7 i7 c7 c8 c1 o i6 c6 c2 i o6 c5 c4 c3 i3 o i5 i4 o o Figure 7. Floor plan of the 64-node 3-stage CNOC. VI. EXPERIMENTAL RESULTS In this section, we compare the performance of the VOQ PDRRM router with the VC router, and compare the CNOC with other high-radix NOC topologies. A. VOQ PDRRM Single Router Performance Methodology: We used our in-house cycle-accurate simulator to evaluate delay performance for the single router with different schemes. In the baseline VC router, there are eight VCs in an input port. The memory in each input port is shared by all of its eight VCs to increase the buffer utilization and implemented as linked lists. The VC router follows the design in [14], by performing VA and SA in the same clock cycle. VCs are assigned non-speculatively after SA, which is an input-first allocator, so that VA simply involves finding a free VC for each output port from a pool of free VCs every clock cycle and assigning it to the flits that win SA. For the VOQ router, we apply two different scheduling schemes DRRM and PDRRM. In the VOQ routers each input port has eight VOQs and the input port memory is shared by the eight VOQs. Figure 8. shows the delay performance results of the VOQ and VC routers with packet lengths equal to eight and input port memory size equal to 256 flits under uniform traffic. For the VC routers, we implemented two kinds of scheduling schemes Parallel Iterative Matching (PIM) and DRRM. Both PIM and DRRM are input-first, which means that an input port first chooses an input VC, either randomly (PIM) or according to round-robin order (DRRM), and then an output port chooses an input port to be granted, either randomly (PIM) or according to round-robin order (DRRM). An 8 8 single VC router with 1-iteration PIM can only achieve about 65% throughput. The DRRM has a similar throughput as the PIM scheme. In the VOQ schemes, we evaluated three kinds of scheduling algorithms: DRRM, PDRRM I, and PDRRM II. PDRRM I follows the algorithm described in section IV, and PDRRM II has some differences. In PDRRM II, the input pointers is stubborn, meaning that when a local winner in an input port loses arbitration in the s, the input pointer does not update until it finally gets granted. In this way the property of fairness in DRRM is well preserved. As illustrated in Figure 8., both DRRM and PDRRM II can achieve 100% throughput, while PDRRM I can achieve 81%. DRRM can achieve 100% throughput under uniform traffic because of the desynchronization effect [19]. PDRRM II preserves the stubbornness of DRRM in the s to maintain the fairness between different VOQs in the same input port, so it can also achieve 100% throughput under uniform traffic. However, from the simulation results we learned that PDRRM II has a much higher average latency in the high-load compared to PDRRM I, which is not tolerable in on-chip environment. Hence throughout this paper we adopt PDRRM I as the scheduling algorithm in CNOC. Though the throughput of PDRRM I cannot achieve 100%, we show that it can still maintain 78% throughput under multi-stage configuration, unlike DRRM dropping to 71%, which is prone to suffer from the TOL blocking problem. Figure 8. Latency vs. injection rate for different scheduling schemes in VC and VOQ routers, with packet length equal to eight. Figure 9. Latency vs. injection rate in CNOC with different router designs. B. VOQ CNOC VS. VC CNOC Figure 9. shows the delay performance for a 64-node 3- stage CNOC with different router designs, each with an input port memory size of 256 flits. With the VC router and DRRM scheduling scheme, the throughput is 62%. As we expected, the throughput of VOQ with the DRRM scheduling scheme drops 186

7 to 71% because of the TOL blocking issue. The PDRRM VOQ scheme can achieve 78% throughput, whichh gives the best performance. C. Latency Performance Comparison between CNOC and Other NOC Topologies In this sub-section, we compare the performance of CNOC with other NOC topologies, including the conventional 2-D Mesh and two other high-radix networks: GFT(2,4,4) (Generalized Fat Tree) [3], and CMeshX2 [4].. The reason why we chose GFT(2,4,4) and CMeshX2 is thatt both topologies consist of routers with at most 8 input/output ports. Generalized Fat Tree: The GFT configuration is described in [3]. GFT(p,q,r) means that there are p stages of connections between different stages of the routers. Each router has q upward connections and r downward connections. Figure 2. shows the configuration of a GFT(2, 4, 4), where there are 32 radix-8 routers and 16 radix-4 routers. All the links in the Fat Tree are bi-directional, which means that a packet only needs to travel upward to the lowest common ancestor router of its source and destination PE before travelling downward. The routing algorithm that we use in Fat Tree is oblivious routing, in which a packet travels to one of the random lowest common ancestor routers, and then travels down to the destination along the only route. Concentrated Mesh: In CMesh, every router is connected to four PEs, and routers form a Mesh network as the conventional 2D Mesh network. CMeshX2, where there are two independent CMesh networks (32 radix-88 routers in a 64- node system), has better power-efficiency and twice the bisection bandwidth compared to CMesh. So we selected CMeshX2 with XY routing as one of the high-radix topologies to evaluate. We actually tried CMeshX2 with express channels, but the throughput does not change too much under uniform traffic with express-channel-prioritized XY routing. Therefore in this paper we only evaluate CMeshX flits, larger than four schemes (64 flits) because the CNOC has the least router number. Even with larger input port memories the total area consumed by the routers in CNOC is still comparable to those of other topologies. TABLE II. summarizes the simulation results, where CNOC has the smallest zero-load latency, and highest throughput. TABLE II. Router Design Total router number Input port memory size SUMMARY OF THE LATENCY PERFORMANCE COMPARISON CNOC PDRRM VOQ 2D Mesh DRRM VC 24 radix-8 64 radix Throughput (%) Zero-load Latency (clock cycle) Fat Tree PDRRM DRRM VOQ VC 32 radix-8 16 radix-4 CMeshX2 DRRM VC 32 radix D. Power Efficiency Comparison between CNOC and Other Topologies Methodology: We follow the methodology in [11] to characterize the dynamic power consumption of different NOCs. We assume that for each flit to traverse through a hop, it consumes, where and are energy consumed by buffer write/read. and are the energy consumed when a flit traverses a crossbar or link. is the energy consumed by the allocators. For energy consumption on NOC links, we use Cadence Spectre to run simulations on models with parameters given by the Predictive Technology Model (PTM) of various lengths. For energy consumption within the NOC routers with different radix numbers, we follow the same methodology described in section IV.D. The energy consumed by one flit operation on different components under 1 GHz is shown in TABLE III. and TABLE IV.. In this paper, we assume that a flit contains 32 bits, and the global links contain 4 more bits for the control signals. TABLE III. Wire Length (mm) EXPERIMENTAL RESULTS FOR DIFFERENT LENGTHS OF LINKS Delay (ns) Flit Energy (pj) TABLE IV. EXPERIMENTAL RESULTS OF ENERGY FOR ROUTERS OF DIFFERENT RADICES Energy for 1 flit operation (pj) Radix-8 Router Radix-5 Router Radix-4 Router Figure 10. Latency vs. injection rate for different topologies with uniform traffic, packet length = 8. Figure 10. illustrates the delay performance results of VOQ CNOC with PDRRM, and other three topologies under uniform traffic. The packet size equals eight. For the Fat Tree, we applied two different router designs, VC DRRM and VOQ PDRRM. For CNOC, we set the input port memory size to be Power Efficiency: To gain more insight into the throughput and power performance, we identify a parameter named power efficiency, which is defined as 1/E, where E = (time for each PE to finish sending 1000 packets at the injection rate with an average latency of 100 clock cycles) x (total energy dissipated during the process). This method assumes that the PEs adopt the injection throttling technique [12] to maintain a certain injection rate to keep the average 187

8 end-to-end latency below a certain level, which is 100 clock cycles in our simulation scheme. The power efficiency is higher if a network spends less time to finish the traffic load, and at the same time spends less energy during the process. In other words, the more packets a network can transmit in spending a joule, the more power efficient it is. Figure 11. shows the normalized power efficiency for the five evaluated schemes. Clearly CNOC has the best performance, followed by the other two high-radix network topologies. The 2D Mesh has the worst power efficiency because of its large average hop counts and low throughput. The reason why CNOC outperforms Fat Tree with the same high-throughput router is because CNOC has fewer routers. Thus, the average cost in CNOC to transfer a packet is less than Fat Tree, since fewer routers are involved in the operation. Figure 11. Normalized network power efficiency for four topologies. VII. CONCLUSION Low-radix NOC architectures do not scale well with the increasing number of PEs in the CMP. The diameter of the network rises rapidly with the growing number of tiles, making the end-to-end communication latency intolerable. Many highimprove network radix NOC topologies have been proposed to performance. Among these we believe the Clos Network-onlow average hop Chip (CNOC) is the most promising with its counts and good load-balancing characteristic. In this paper, we propose (1) PDRRM VOQ router design to achieve high speed and high throughput in CNOC, (2) a heuristic floor-planning algorithm to minimize the power consumption caused by the long wires. Experimental results show that the throughput of a 64-node 3-stage CNOC under uniform traffic increases from 62% to 78% by replacing the baseline Virtual Channel (VC) routers with PDRRM VOQ routers. We also compared CNOC with other high-radix NOC topologies under the same router radix upper bound, and found that under the new design techniques CNOC has the highest throughput, lowest zero-load latency, and best power efficiency. ACKNOWLEDGMENT This work was supported by a Polytechnic Institute of NYU Angel Fund grant and by U.S. Army CERDEC. The authors would also like to thank Dr. N. Sertac Artan and Dr. Yang Xu for their precious comments, Chao Zeng and Zhenyu Sun for their help in simulations. REFERENCES [1] C. Clos, A study of non-blocking switching networks, Bell System Technical Journal, pp , 424, Mar [2] D. Pamunuwa, J. Oberg, L. R. Zheng, M. Millberg, A. Jantsch, and H. Tenhunen, Layout, performance rmance and power trade-offs in mesh-based network-on-chip architectures, in Proc. IFIP Int. Conf. Very Large Scale Integr., Dec. 2003, p [3] S. R. Ohring, M. Ibel, S. K. Das, and M. J. Kumar, On generalized fat trees, Proceedings of the 9th Intl.. Symp. on Parallel Processing, Washington DC, [4] J. Balfour and W. J. Dally,, Design tradeoffs for tiled CMP on-chip networks, in ICS 06: Proc. of the 20th annual international conference on Supercomputing, 2006, pp [5] J. Kim, J. Balfour, W. J. Dally, Flattened Butterfly Topology for On- Chip Networks, IEEE Computer Architecture Letters, vol. 6, no. 2, pp , July-Dec [6] D. Ludovici, F. Gilabert, S. Medardoni, and C. Gomez, Assessing Fat- Tree Topologies for Regular Network-on-Chip Design under Nanoscale Technology Constraints, Proc. of Conf. on Design, Automation and Test in Europe, [7] A. Adriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. A. Zeferino, SPIN: a scalable, packet switched, on-chip micro-network, in Design Automation tion and Test in Europe Conf. and Exhibition, [8] C. Gomez, F. Gilabert, M. E. Gomez, P. Lopez and J. Duato, RUFT: Simplifying the Fat-tree Topology,, In Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems, [9] D. N. Jayasimha, B. Zafar, and Y. Hoskote, On-Chip Interconnection Networks: Why They are Different and How to Compare them, Technical Report, Intel Corp, [10] T. T. Ye and G. De Micheli, Physical planning for multiprocessor networks and switch fabrics. In Proc. ASAP, [11] H. Wang, X. Zhu, L. S. Peh, and S. Malik, Orion: a power-performance simulator for interconnection networks, In Proc. Of the 35th Intl. Symp. on Microarchitecture, [12] E. Baydal, P. Lopez, and J. Duato, A Family of Mechanisms for Congestion Control in Wormhole Networks, IEEE Trans. on Parallel and Distributed Systems, 16(9): , [13] William J. Dally, Wire-Efficient VLSI Multiprocessor Communication Networks, Proceedings of the Stanford Conference on Advanced Research in VLSI, Paul Losleben, ed., MIT Press, March 1987, pp [14] Amit Kumar, Partha Kundu, Arvind Singh, Li-Shiuan Peh and Niraj K. Jha, A 4.6Tbits 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator in 65nm CMOS. In 25th International Conference on Computer Design, October [15] John Kim, William J. Dally, Brian Towels, Amit K. Gupta, Microarchitecture of High Radix router. Proceedings. 32th Annual International Symposium on Computer Architecture(ISCA), June 2005, pp [16] Peh, A High-Throughput Distributed Shared-Buffer NoC Router. Soteriou, ou, V.; Ramanujam, R.S.; Lin, B.; Li-Shiuan Peh Computer Architecture Letters. Volume 8, Issue 1, Date: Jan. 2009, Pages: [17] H. J. Chao and J.S. Park, "Centralized contention resolution schemes for a large-capacity optical ATM switch," in Proc. IEEE ATM Workshop, Fairfax, Virginia, May 1998 [18] N. McKeown, "The islip scheduling algorithm for input-queued switches," IEEE/ACM Transactions on Networking, vol. 7, no. 2, pp , Apr [19] H. Jonathan Chao and Bin Liu, High Performance Switches and Routers. [20] W.Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers Inc., San Francisco, CA,

Reservation Cut-through Switching Allocation for High-Radix Clos Network on Chip

Reservation Cut-through Switching Allocation for High-Radix Clos Network on Chip Reservation Cut-through Switching Allocation for High-Radix Clos Network on Chip Yang Xu, Tang Jiang, Ming Yang and H. Jonathan Chao Department of Electrical and Computer Engineering Polytechnic Institute

More information

Scalable Schedulers for High-Performance Switches

Scalable Schedulers for High-Performance Switches Scalable Schedulers for High-Performance Switches Chuanjun Li and S Q Zheng Mei Yang Department of Computer Science Department of Computer Science University of Texas at Dallas Columbus State University

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs

Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs Butterfly vs. Unidirectional Fat-Trees for Networks-on-Chip: not a Mere Permutation of Outputs D. Ludovici, F. Gilabert, C. Gómez, M.E. Gómez, P. López, G.N. Gaydadjiev, and J. Duato Dept. of Computer

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

K-Selector-Based Dispatching Algorithm for Clos-Network Switches

K-Selector-Based Dispatching Algorithm for Clos-Network Switches K-Selector-Based Dispatching Algorithm for Clos-Network Switches Mei Yang, Mayauna McCullough, Yingtao Jiang, and Jun Zheng Department of Electrical and Computer Engineering, University of Nevada Las Vegas,

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch

Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Concurrent Round-Robin Dispatching Scheme in a Clos-Network Switch Eiji Oki * Zhigang Jing Roberto Rojas-Cessa H. Jonathan Chao NTT Network Service Systems Laboratories Department of Electrical Engineering

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

ES1 An Introduction to On-chip Networks

ES1 An Introduction to On-chip Networks December 17th, 2015 ES1 An Introduction to On-chip Networks Davide Zoni PhD mail: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Sources Main Reference Book (for the examination) Designing Network-on-Chip

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES 1 Jaya R. Surywanshi, 2 Dr. Dinesh V. Padole 1,2 Department of Electronics Engineering, G. H. Raisoni College of Engineering, Nagpur

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

Fair Chance Round Robin Arbiter

Fair Chance Round Robin Arbiter Fair Chance Round Robin Arbiter Prateek Karanpuria B.Tech student, ECE branch Sir Padampat Singhania University Udaipur (Raj.), India ABSTRACT With the advancement of Network-on-chip (NoC), fast and fair

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches

PCRRD: A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches : A Pipeline-Based Concurrent Round-Robin Dispatching Scheme for Clos-Network Switches Eiji Oki, Roberto Rojas-Cessa, and H. Jonathan Chao Abstract This paper proposes a pipeline-based concurrent round-robin

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

Prevention Flow-Control for Low Latency Torus Networks-on-Chip

Prevention Flow-Control for Low Latency Torus Networks-on-Chip revention Flow-Control for Low Latency Torus Networks-on-Chip Arpit Joshi Computer Architecture and Systems Lab Department of Computer Science & Engineering Indian Institute of Technology, Madras arpitj@cse.iitm.ac.in

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS 1 SARAVANAN.K, 2 R.M.SURESH 1 Asst.Professor,Department of Information Technology, Velammal Engineering College, Chennai, Tamilnadu,

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS A Thesis by SONALI MAHAPATRA Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS Chandrika D.N 1, Nirmala. L 2 1 M.Tech Scholar, 2 Sr. Asst. Prof, Department of electronics and communication engineering, REVA Institute

More information

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 34 (2014 ) 552 558 2014 International Workshop on the Design and Performance of Network on Chip (DPNoC 2014) Packet-based

More information

Designing Energy-Efficient Low-Diameter On-chip Networks with Equalized Interconnects

Designing Energy-Efficient Low-Diameter On-chip Networks with Equalized Interconnects Designing Energy-Efficient Low-Diameter On-chip Networks with Equalized Interconnects Ajay Joshi, Byungsub Kim and Vladimir Stojanović Department of EECS, Massachusetts Institute of Technology, Cambridge,

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Prediction Router: Yet another low-latency on-chip router architecture

Prediction Router: Yet another low-latency on-chip router architecture Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS A Dissertation by MIN SEON AHN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Design of a Tile-based High-Radix Switch with High Throughput

Design of a Tile-based High-Radix Switch with High Throughput 2011 2nd International Conference on Networking and Information Technology IPCSIT vol.17 (2011) (2011) IACSIT Press, Singapore Design of a Tile-based High-Radix Switch with High Throughput Wang Kefei 1,

More information

Dynamic Scheduling Algorithm for input-queued crossbar switches

Dynamic Scheduling Algorithm for input-queued crossbar switches Dynamic Scheduling Algorithm for input-queued crossbar switches Mihir V. Shah, Mehul C. Patel, Dinesh J. Sharma, Ajay I. Trivedi Abstract Crossbars are main components of communication switches used to

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

On-Die Interconnects for next generation CMPs

On-Die Interconnects for next generation CMPs On-Die Interconnects for next generation CMPs Partha Kundu Corporate Technology Group (MTL) Intel Corporation OCIN Workshop, Stanford University December 6, 2006 1 Multi- Transition Accelerating We notified

More information

The Benefits of Using Clock Gating in the Design of Networks-on-Chip

The Benefits of Using Clock Gating in the Design of Networks-on-Chip The Benefits of Using Clock Gating in the Design of Networks-on-Chip Michele Petracca, Luca P. Carloni Dept. of Computer Science, Columbia University, New York, NY 127 Abstract Networks-on-chip (NoC) are

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

Evaluating Bufferless Flow Control for On-Chip Networks

Evaluating Bufferless Flow Control for On-Chip Networks Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University In a nutshell Many researchers report high buffer

More information

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics Lecture 14: Large Cache Design III Topics: Replacement policies, associativity, cache networks, networking basics 1 LIN Qureshi et al., ISCA 06 Memory level parallelism (MLP): number of misses that simultaneously

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP Rehan Maroofi, 1 V. N. Nitnaware, 2 and Dr. S. S. Limaye 3 1 Department of Electronics, Ramdeobaba Kamla Nehru College of Engg, Nagpur,

More information

NETWORK-ON-CHIPS (NoCs) [1], [2] design paradigm

NETWORK-ON-CHIPS (NoCs) [1], [2] design paradigm IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Extending the Energy Efficiency and Performance With Channel Buffers, Crossbars, and Topology Analysis for Network-on-Chips Dominic DiTomaso,

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

Express Virtual Channels: Towards the Ideal Interconnection Fabric

Express Virtual Channels: Towards the Ideal Interconnection Fabric Express Virtual Channels: Towards the Ideal Interconnection Fabric Amit Kumar, Li-Shiuan Peh, Partha Kundu and Niraj K. Jha Dept. of Electrical Engineering, Princeton University, Princeton, NJ 8544 Microprocessor

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Routing of guaranteed throughput traffic in a network-on-chip

Routing of guaranteed throughput traffic in a network-on-chip Routing of guaranteed throughput traffic in a network-on-chip Nikolay Kavaldjiev, Gerard J. M. Smit, Pascal T. Wolkotte, Pierre G. Jansen Department of EEMCS, University of Twente, the Netherlands {n.k.kavaldjiev,

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information