SYSARC 768 No. of Pages 14, Model 5+ ARTICLE IN PRESS UNCORRECTED PROOF

Size: px
Start display at page:

Download "SYSARC 768 No. of Pages 14, Model 5+ ARTICLE IN PRESS UNCORRECTED PROOF"

Transcription

1 1 Journal of Systems Architecture xxx (2007) xxx xxx 2 Deadlock free routing algorithms for irregular mesh topology 3 NoC systems with rectangular regions q 4 Rickard Holsmark a, *, Maurizio Palesi b, Shashi Kumar a 5 a School of Engineering, Jönköping University, Sweden 6 b DIIT, University of Catania, Italy 7 Received 15 December 2006; received in revised form 18 May 2007; accepted 17 July Abstract 10 The simplicity of regular mesh topology Network on Chip (NoC) architecture leads to reductions in design time and manufacturing 11 cost. A weakness of the regular shaped architecture is its inability to efficiently support cores of different sizes. A proposed way in lit- 12 erature to deal with this is to utilize the region concept, which helps to accommodate cores larger than the tile size in mesh topology NoC 13 architectures. Region concept offers many new opportunities for NoC design, as well as provides new design issues and challenges. One 14 of the most important among these is the design of an efficient deadlock free routing algorithm. Available adaptive routing algorithms 15 developed for regular mesh topology cannot ensure freedom from deadlocks. In this paper, we list and discuss many new design issues 16 which need to be handled for designing NoC systems incorporating cores larger than the tile size. We also present and compare two 17 deadlock free routing algorithms for mesh topology NoC with regions. The idea of the first algorithm is borrowed from the area of fault 18 tolerant networks, where a network topology is rendered irregular due to faults in routers or links, and is adapted for the new context. 19 We compare this with an algorithm designed using a methodology for design of application specific routing algorithms for communica- 20 tion networks. The application specific routing algorithm tries to maximize adaptivity by using static and dynamic communication 21 requirements of the application. Our study shows that the application specific routing algorithm not only provides much higher adap- 22 tivity, but also superior performance as compared to the other algorithm in all traffic cases. But this higher performance for the second 23 algorithm comes at a higher area cost for implementing network routers. 24 Ó 2007 Elsevier B.V. All rights reserved. 25 Keywords: Networks on Chip; Mesh topology; Routing algorithms; Wormhole switching; Deadlock; Application specific routing Introduction 28 Network on Chip (NoC) is slowly being accepted as an 29 important paradigm for implementing communication 30 among various cores in a SoC. Network topology and 31 routing algorithms are two of the most important aspects 32 which distinguish various proposed NoC architectures [1 33 5]. Fixed tile size based two dimensional mesh topology 34 is favored by many research groups because of its layout efficiency, good electrical properties and simplicity in addressing on-chip resources. Such a physically homogeneous network is not efficient for incorporating cores of different sizes in the network. In such a network, the tile size should be able to accommodate the physically largest core, such as a shared memory. It will also be hard to reuse earlier designed multi-core sub-systems within a fixed tile size based NoC. To overcome these problems the concept of a region was proposed in [1]. This concept allows a rectangular area in the mesh, larger than a tile, to be declared as a region. The region is isolated from the outside network using a wrapper as shown in Fig. 1. There are many advantages of using a modified mesh topology NoC to handle cores larger than the tile size rather than developing a q This paper is an extended version of the paper presented at DSD 2006 [26]. * Corresponding author. address: hori@ing.hj.se (R. Holsmark) /$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi: /j.sysarc

2 2 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx NoC Router Region Wrapper Normal Sized NoC Tile Region Fig. 1. Region within a mesh topology NoC. 49 new topology for each new SoC. The modified topology 50 automatically inherits the scalability property of the 51 underlying mesh topology. Due to known and uniform 52 length of wires for the links it is possible to guarantee good 53 electrical properties of the signals. The design of the routers 54 can also be reused across designs. 55 In a NoC system with regions, routing of packets 56 becomes more complex. Some network routers are 57 removed from the mesh network to accommodate a large 58 region. In effect, a region acts as an obstacle to the network 59 traffic. This not only results in higher packet latency, but 60 deadlock free routing algorithms designed for regular mesh 61 network are no more usable. 62 Routing in networks has been classified in several ways 63 in literature [6]. Routing schemes can be classified as source 64 routing or distributed routing. In source routing the source 65 node decides the entire path for a packet and appends it as 66 a field in the packet. In this scheme there is no possibility of 67 adapting the route after the packet leaves the source. In 68 distributed routing schemes a router on receiving the 69 packet decides whether it should be delivered to the local 70 resource or forwarded to a neighboring router. Routing 71 algorithms are also classified as deterministic or adaptive. 72 In deterministic routing the routing path is decided only 73 from the source and destination addresses. In an adaptive 74 routing scheme multiple paths from the source node to des- 75 tination node are possible. A particular path can be 76 selected to optimize certain performance parameters. 77 Two properties which are necessary in all usable routing 78 algorithms are deadlock- and livelock freedom. These two 79 properties respectively ensure that packets are not blocked 80 in the network for ever or wonder across the network 81 indefinitely [7]. A large number of algorithms exist for reg- 82 ular topology networks which ensure these properties. But 83 only a few algorithms exist for irregular topologies which 84 are both efficient and allow deadlock free routing. Another 85 desirable property of a routing algorithm is that it gives fair 86 and uniform performance to all equal priority traffic in the 87 network. Achievement of this property is harder in a net- work with irregular topology than in a regular topology network. This paper resulted from an effort to search for an algorithm which can handle irregularity, induced in a regular mesh topology by multiple rectangular regions of various sizes, and can provide these properties. The cost of a routing scheme is reflected in the implementation cost of the router. Generally, there is a tradeoff between cost and performance implying that routing schemes providing higher performance are costlier as compared to routing schemes with lower performance. Recently, power consumption is also being considered as a cost parameter in the design of network on chip architectures [8]. This paper focuses on evaluating and comparing performance of two distinct types of adaptive deadlock free routing algorithms for irregular topology mesh networks. The rest of the paper is organized as follows. In Section 2 we review related work. Section 3 presents the region concept and lists its applications and design issues in SoC design. In Section 4, we discuss the important issue of deadlock free routing and describe two different types of routing algorithms that can be used for NoC platforms with regions. We also briefly discuss the hardware implication of the algorithms. In Section 5 we present evaluation of these two routing algorithms and present results comparing their performance for synthetic communication traffic as well as traffic in a realistic multi-media application. Section 6 concludes the paper and lists some research problems for the future. 2. Related work Many factors affect the overall performance of a NoC. Network topology, flow control mechanism, switching technique and routing algorithm represent just a short list. In this paper we focus on routing algorithms in which the underlying switching technique is based on the wormhole concept [9]. Wormhole switching used in communication networks is proposed by several researchers (e.g., [4]) as most suitable for on-chip communication. It is preferred for two main reasons. First, it requires smaller router buffers as compared to the store-and-forward switching scheme. Second, network latency becomes relatively insensitive to path length due to the pipelined nature of flow of flits. Unfortunately, wormhole routing is very susceptible to deadlocks because messages are allowed to hold many resources while requesting others. To solve the problem of deadlock, many algorithms have been proposed for mesh topology networks in literature. For example, the simple X Y routing algorithm and Turn-model based [10] algorithms like west-first, are deadlock free in mesh networks. However, none of these can be used for meshes with regions, since circumventing a region is impossible because of the restrictions on the allowed turns. Neeb et al. [11] have proposed a methodology called INoC in which a customized topology and a customized deadlock free routing algorithm is designed for an applica

3 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx tion. They show that for irregular traffic loads, the perfor- 144 mance of INoC approach is better than regular topologies 145 like Mesh, Tori and Spidergon. As the first step, INoC 146 approach starts with a floor-plan of required hardware 147 resources and a bidirectional chain topology in which all 148 pairs have path between them. Additional shortcut 149 channels are added to increase bandwidth required to sat- 150 isfy application s traffic. A change in traffic patterns 151 requires re-computation of required short-cuts. A table 152 based router design is assumed for implementation. The 153 approach does not consider physical sizes of various cores 154 and its effect on physical layout. 155 Bolotin et al. [4] have proposed a non-homogeneous 156 mesh topology NoC architecture allowing rectangular 157 cores larger than the mesh tile. Their solution to deadlock 158 free routing is to extend X Y routing with hard coded 159 paths which are computed off-line. This solution has prob- 160 lems with being reused across applications and cannot take 161 care of modifications in the communication topology of an 162 application. 163 A problem similar to regions can be recognized when 164 designing fault-tolerant routing algorithms for mesh net- 165 works. Several of these algorithms consider faults to be 166 contained in rectangular blocks similar to regions. In this 167 category, virtual channels [12] have been used to facilitate 168 design of such algorithms. In [13] Boppana and Chalasani 169 show that, using just one extra virtual channel per physical 170 channel, the well-known e-cube algorithm can be used to 171 provide deadlock free routing in networks with non-over- 172 lapping fault rings. In the same paper the authors prove 173 that at most four additional virtual channels are sufficient 174 to make fully adaptive algorithms tolerant to multiple 175 faulty blocks in n-dimensional meshes. A deterministic 176 fault-tolerant wormhole routing algorithm for mesh net- 177 work is presented by Zhou and Lau in [14]. The proposed 178 algorithm can tolerate convex fault-connected regions but 179 requires three virtual channels. Nevertheless, use of virtual 180 channels adds resources and increase design complexity. 181 Some researchers have proposed fault tolerant algorithms 182 without the use of virtual channels. These are based on 183 non-adaptive routing algorithms that are modified to work 184 in the presence of faults or regions. In [15], Wu proposes 185 modifications to X Y routing algorithm to route around 186 faulty blocks, but also imposes some restrictions. In [16] 187 an algorithm that is less restricted was proposed by Chen 188 and Chiu. Based on [16] a non-minimal deadlock free rout- 189 ing algorithm is also described for irregular topology NoC 190 with regions in [17,39]. Mejia et al. in [18] propose a deter- 191 ministic routing methodology for tori and meshes which 192 achieves high performance without the use of virtual chan- 193 nels. Furthermore, it is topological agnostic in nature, 194 meaning it can handle any topology derived from any com- 195 bination of faults. Unfortunately all the aforementioned 196 routing algorithms are deterministic, i.e. they do not allow 197 adaptivity to communication traffic. 198 Adaptivity is a characteristic of a routing algorithm to 199 adapt to changing situations. Therefore, number of alter- native paths provided by a routing algorithm for routing a message from a source node to a destination node can be used as a measure of its adaptivity. A routing algorithm, with high adaptivity also has a potential of providing high performance (low latency, low packet drop and high throughput), fault tolerance and uniform utilization of network resources. Of course adaptivity has some drawbacks like the problem that packets can reach the destination in an out-of-order fashion due to the difference in congestion levels on the multiple paths. However different approaches have been proposed in literature to cope with this problem like the use of simple re-ordering mechanism at network reconvergent nodes proposed by Murali et al. in [19]. One of the most important steps in the development of a theoretical framework for the design of adaptive deadlock free routing algorithms is due to Duato. In [20] he proposed a general theory to develop highly adaptive deadlock free routing algorithms for a general communication network which uses wormhole switching technique. Duato s theory is based on the idea of channel dependency graphs [21]. These graphs are used to identify a set of consecutive communication channels in the network, which if used concurrently can cause a deadlock situation. If no cycles exist in such a graph, the analyzed routing algorithm is deadlock free. Duato s theory does not exploit the possible knowledge of the communication traffic characteristics since it has been designed in a general-purpose domain where virtually each network node can communicate with any other node of the network. In [22] we focused on the embedded system domain where, often, the knowledge of communication traffic characteristics is available at design time. We took advantage of this additional knowledge to extend Duato s theory in such a way as to generate highly adaptive and deadlock free application specific routing algorithms. The approach, named APSRA (Application Specific Routing Algorithm), has been evaluated on homogeneous 2D mesh NoC architectures and compared with turn model based routing algorithms. However the approach is general and can be applied to any network topology like nonhomogeneous 2D mesh with regions. In this work we apply APSRA methodology to develop routing algorithms for mesh topology in which irregularity has been introduced by incorporating regions. This paper makes the following main contributions. We list the issues and problems when designing a mesh topology NoC system using cores larger than the tile size. We propose and compare performance and cost of two distinct approaches for designing deadlock free routing algorithms for this special type of irregular topology networks. The simulation based performance analysis clearly demonstrates that the APSRA approach is distinctively better. 3. Region concept and new design issues The region concept presented in [1] was intended for use of larger resources, which do not fit in the fixed sized slot of a regular mesh architecture layout. Region concept could

4 4 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 255 in addition be useful for encapsulating a group of resources 256 which have very high and special communication require- 257 ments which cannot be supported by the general NoC com- 258 munication infrastructure. Within such a region, there 259 could be specialized interconnections as well as communi- 260 cation protocols for achieving the required performance. 261 The concept also allows encapsulation of a group of 262 resources as a region for special requirements such as 263 power consumption or data security. 264 Above applications of region implies that the region 265 structure is physically different in design from its surround- 266 ings. This is however not necessary; it is possible that the 267 region is defined as a logical structure. In this case the inter- 268 nal hardware design of the region is identical with the out- 269 side NoC structure but is logically isolated from the 270 surrounding network. This assumes that there are configu- 271 rable routers in the NoC that can be used for defining and 272 maintaining a region. These routers on the region bound- 273 ary isolate the computation and communication within 274 the region from external traffic. Another application of 275 the region concept is to support different configurations 276 of power/performance modes of resources inside a region 277 by control of operating voltage, clock frequency etc. 278 We argue that reuse of multi-core subsystems will 279 become a very important application of the region concept 280 in the near future. Region concept can, for example, be 281 applied for the reuse of subsystems which have been devel- 282 oped for efficient processing of multi-media applications. 283 These solutions are currently available as separate SoCs. 284 Hence, the concept of region offers the possibility of raising 285 the level of reuse from a core to a level where specially 286 designed multi-core subsystems can be reused. It is unlikely 287 that these subsystems will physically fit in the general slot 288 for a core in the mesh NoC. Without the region concept External Access Points Multi- Core System Region Wrapper the subsystem will have to be redesigned keeping in view the NoC constraints. The effort required to redesign may be too high, or the redesigned subsystem may not be able to achieve the required performance in the NoC context. Fig. 2 illustrates the possibility of reusing a multi-core SoC, presented in [23], as a NoC region. The region concept presented in [1] suggested a convex shape of a region. This is easier to handle in terms of routing but may not be optimal in the case of placement and shape of the region Routing in NoC with regions Efficient routing of messages within the network is essential in order to fully exploit the power of the computing resources and achieve good performance for applications running on them. A good routing algorithm should not only provide low latency for messages but should also be deadlock free when the network is concurrently routing multiple messages. However, incorporating regions in mesh networks result in a major change of the communication infrastructure and the existing mesh routing algorithms cannot be directly reused. In addition to creating problems of deadlock freedom, regions also affect the traffic distribution in the network. Traffic flows which get obstructed by the region have to circumvent it in order to make progress. This could make the border links of the region more heavily used as compared to other links. Adaptive routing is one solution that can reduce the problem of local congestions. Normally, the term adaptive refers to a possibility to sense congestion and take action to divert from it. In this sense it is reactive. When regions are used in a NoC it is possible that this information is incorporated in the routing algorithm so that occurrence of congestion is reduced or avoided Accessing and addressing regions How the region is accessed and how it can access other resources is an important issue while designing with regions. Since a region occupies a larger area than a standard resource, it may be useful to consider several addresses and several access points to it. A large region may internally provide different types of access mechanisms to its internal resources. The external access points have to be properly connected to the internal access mechanisms. The purpose for which the region is used can also affect how the region is designed. A large shared memory is likely to require several access points distributed around the entire border, whereas a system with many processing elements may be accessed only by a few resources outside the region. The number of access points determines the communication bandwidth between a region and the rest of the network; the position of access points on the region boundary affects the communication latency of data. When using a region the issue of access-points and addresses to the region must be defined. The three major options, in Fig. 2. Multi-core subsystem [23] as a NoC region

5 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx order of increased routing complexity and accessing power 343 are: Use the corner router which originally had a resource 345 connected to it as a single access point Use the routers on the border that originally had con- 347 nections to resources within the region, as multiple 348 access points Use all the possible routers on the boundary as multiple 350 access points to the region. In this case some routers are 351 connected to both a standard resource and a region Fig. 2 illustrates how a region can be accessed using 354 multiple access points. The routers through which access 355 is possible are shaded gray Design issues 357 As described in the previous section, the concept of 358 region extends the possibilities for NoC paradigm based 359 SoC design in many interesting and useful ways. Many 360 new design space exploration activities can be performed 361 in NoC systems with regions. Here we list a few new 362 degrees of freedom available for exploration. 363 Placement of the region. 364 Shape or aspect ratio of the region. 365 Number of access points to the region. 366 Position of access points on the region boundary These parameters span a huge design space which poses 369 new challenges in the research area of design space explo- 370 ration strategies Deadlock free routing algorithms for NoC systems 372 The deadlock free algorithms developed for homoge- 373 nous mesh networks, like Odd Even routing algorithm 374 [24], cannot be directly used in NoC with regions. To be 375 able to reach all destinations the routing algorithm has to 376 decide upon turns to get around the region. This will in 377 many situations violate rules that were used to secure dead- 378 lock freeness property in the case of a homogenous topol- 379 ogy NoC. Breaking these rules in order to reach a 380 destination may result in a deadlock situation. 381 In the following subsections we describe two routing 382 algorithms that we have used in our evaluation of routing 383 performance of NoC in the presence of regions. They rep- 384 resent two distinct approaches that can be applied to guar- 385 antee deadlock free routing in a NoC both with and 386 without regions. Due to cost considerations of on-chip 387 resources, we only present algorithms that do not require 388 virtual channels. However, it is possible to include this fea- 389 ture to increase network performance even further. The 390 first approach is adopted from the area of fault tolerant 391 routing. It is a general routing algorithm in the sense that 392 it works for any traffic scenario and region placement in a NoC. This results in good scalablilty and it supports dynamic changes of both architecture and communication patterns. The second approach has evolved from knowledge of the design optimization of embedded systems. It relies on the assumption that communication among tasks in an embedded application is known in advance. This information about the communication is incorporated when designing the routing algorithm. As we need not to consider all possible communication patterns, fewer restrictions have to be applied for the routes of the actual communications to avoid deadlocks. Thus, an application specific routing algorithm can have more adaptivity as compared to a general algorithm. However, any change in architecture or communication pattern requires a re-analysis and possibly re-design of the complete routing algorithm Routing algorithm adapted from fault tolerance area Chen and Chiu [16] presented a fault tolerant algorithm that can be used for routing in the presence of regions. However, the published algorithm had some errors which have later been corrected. The new version of this algorithm is possible to use in the presence of regions for reaching all destinations in a deadlock free manner [17]. We describe the basic ideas in the original algorithm here, for a full description of the algorithm, see [16] and [17]. For our purpose a faulty block described in the original algorithm is equivalent to a region. Chen and Chiu [16] borrow the idea of rings and chains from [13] to isolate the faulty nodes from the rest of the network. For messages which do not encounter any ring or chain, they allow nonadaptive routes which use maximum one turn from source to destination. For messages encountering faulty blocks it becomes necessary to allow some extra turns which are forbidden during normal routing. Only a few combinations of forbidden turns are allowed in a clever manner such that these turns can never combine with each other (or with the allowed normal turns) to form a cycle. When routing on paths not affected by faults, messages are forwarded in the network according to their type, as illustrated in Fig. 3. A message is of type row first (RF) if it has the destination to its west. If the destination is to its north or south it is a column first (CF) message. A message of type RF can thus change to CF when it reaches the CF CF RF CF CF RO Fig. 3. Message types and corresponding allowed routes in algorithm

6 6 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 437 column of destination. If it has its destination to its east it 438 is of type column first (CF) except when the destination is 439 in the same row, then it is row only (RO). A CF can also 440 change to RO if the destination is in the same row to its 441 east. However, an RO message never changes its type. If 442 a message hits the border of a faulty block special rules 443 apply depending on the type of the message and whether 444 the border resides on fault ring or a fault chain. There 445 are different rules for routing around these depending on 446 whether faults are surrounded by, an s-chain (chain that 447 touch the south border only), a non s-chain (chain that 448 touches only the west or west and south border) or ring 449 (all other positions of rings and chains). 450 Fig. 4 illustrates routes for some messages when travel- 451 ing in the presence of faulty blocks (regions). In this, mes- 452 sages are denoted by their source (Sn) and destination 453 (Dn) Application specific routing algorithms 455 Typical routing algorithms for NoC systems are 456 designed for a specific network topology and are indepen- 457 dent from the application which will be mapped on the 458 NoC. If a small variation of the topology should occur 459 (e.g., due to the merging of tiles of a mesh based network 460 to form a region) the routers need to be redesigned. The 461 use of routing tables helps to overcome this problem and 462 makes the router general and configurable. Routing tables 463 are filled up with information, which enables the communi- 464 cation between every pair of network nodes. The constraint 465 to be satisfied is that the channel dependency graph (CDG) 466 [21] should not contain any cycle to be sure that the routing 467 is deadlock free [20]. To do this, some possible paths, that 468 allow two nodes to communicate, must be prohibited caus- 469 ing a degradation of routing adaptiveness. This is, how- 470 ever, a strong limitation in an embedded system scenario 471 and the designer cannot exploit his knowledge of the appli- 472 cation that will be mapped on the NoC. 473 Often the designer is aware about which cores that com- 474 municate, and which do not. To overcome this limitation a 475 methodology to generate application specific routing func- 476 tions has been proposed in [22]. The basic idea of this meth- S1 S3 D2 D3 odology, known as APSRA (APplication Specific Routing Algorithm), is to extend Duato s theory in such a way as to exploit the designer s knowledge about communication characteristics of the application being implemented. Fig. 5 shows the APSRA design methodology. The inputs of the methodology are: (1) the application modelled by means of task graphs, (2) the network topology modelled by means of a topology graph, and (3) a mapping function which maps each task of the task graph to a node of the topology graph. In addition, concurrency information, available after the task scheduling phase, can also be considered [29]. Using this information an application specific channel dependency graph (ASCDG) is built. In [22] it is proved that if the ASCDG is acyclic then the routing is deadlock free. Since the ASCDG is a sub-graph of the CDG, it has more probability to be acyclic. This probability is quite high since, in practical cases, each node of the network communicates with a small subset of other nodes. The result is that a number of dependencies that are present in the CDG (which is built by conservatively assuming that all the network nodes will communicate) are not present in the ASCDG (which is built by knowing the actual communicating pairs). However, if the ASCDG is not acyclic, a heuristic to break all the cycles with the objective to minimise the impact on the degree of adaptiveness, and with the constraint to guarantee destination reachability has been proposed in [22]. The output of the methodology is a set of routing tables (one for each router of the NoC) which not only guarantees the reachability and the deadlock freeness of communication among tasks but also tries to maximise routing adaptivity. Finally, a compression technique can be used to compress the generated routing tables [27] APSRA: A practical example For the sake of example, let us consider the communication graph and the topology graph depicted in Fig. 6a and b respectively. Although for this example the topology is mesh-based, the approach is general and can be applied to any network topology without modification. As mapping function, let us consider M(T i )=P i, i = 1,2,3,4,5. The CDG for a minimal fully adaptive routing algorithm is shown in Fig. 6c. Since it contains six cycles, Duato s theorem cannot assure deadlock freeness of the minimal fully adaptive routing for this topology. The number of cycles is reduced to two for the ASCDG as shown in Fig. 6d. We observe that some dependencies in the CDG are not present in the ASCDG. For instance, the edge corresponding to dependency l 1,2! l 2,3 in CDG does not appear in ASCDG. In fact, channels l 1,2 and l 2,3 can be used in sequence only for the communications T1! T3, T1! T6, and T4! T3 which are not present in the CG. Although also in this case we cannot assure deadlock freeness, we can simply break the cycle as follows. The application specific channel dependency l 4,1! l 1,2 is due to the communication D1 S2 Active Nodes Faulty Nodes Route Non S-Chain S-Chain Fault-Ring Fig. 4. Message routes when encountering fault rings and chains

7 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 7 Application Application to be mapped to be mapped Communication Graph T2 T1 T4 T3 Tn Mapping Mapping Function Function Network Topology P1 P2 P3 P4 P5 P7 P8 P6 P9 a T6 C1 C2 Cm Comm. Concurrency Communication Graph T1 T5 l 12 l 21 l 41 l 14 l 52 l 45 l 54 Memory Memory budget budget 533 T4! T2. Such communication can be realized by both 534 paths P4! P5! P2 and P4! P1! P2. If the routing 535 function is restricted in such a way as the latter path is 536 prohibited, the application specific channel dependency 537 l 4,3! l 3,1 does not exist any longer. In a similar way it is 538 possible to break the second cycle, removing, for instance, 539 the dependency l 1,4! l 1,5 due to the communication 540 T1! T5. However, this restriction reduces the degree of 541 adaptiveness of the routing. Now suppose that we have 542 some knowledge about communication concurrency and 543 suppose that communication T1! T5 and communication T2 T4 T3 l 23 l 32 APSRA APSRA Routing Tables Compression Compression P10 P11 P12 P13 Compressed Routing Tables Fig. 5. Overview of APSRA design methodology. l 25 l 63 l 36 l 56 l 65 b P1 P4 l 12 l 21 l 41 l 14 l 52 Topology Graph l 45 l 54 P2 P5 l 12 l 21 l 41 l 14 l 52 l 45 l 54 T2! T4 do not overlap in time. Fig. 6e highlight the dependencies due to such communications. Since these communications are not concurrent, the associated dependencies are not concurrently active too. The result is that the two cycles are actually false cycles. In conclusion, for this latter case a minimal fully adaptive routing is deadlock free. l 23 l 32 l 56 l 65 P3 l 25 l 63 l 36 c Channel Dependency Graph d Application Specific e T1 T5 P6 Channel Dependency Graph l 23 l 32 l 25 l 63 l 36 Fig. 6. Comparison of cyclic dependencies without and with APSRA methodology. l 56 l 65 T Some notes about APSRA s complexity The construction of the ASCDG involves the annotation of each minimum path between any source/destination T4 l 12 l 21 l 41 l 14 l 52 l 25 l 45 l

8 8 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 554 pair as defined in the communication graph. The basic 555 assumption is that we start from a minimal fully adaptive 556 routing algorithm. If we consider a mesh-based topology, 557 the complexity to annotate all the minimal paths for a 558 given source destination pair is O(2 n ) where n is the dimen- 559 sion of the quadrant containing the source and destination 560 nodes. It means that, as the NoC size increases, the 561 approach could become infeasible if some nodes located 562 far from each other need to communicate. It should be 563 pointed out, however, that this is a very worst case condi- 564 tion and, in any case, it can be managed efficiently consid- 565 ering the following. First, any topological mapping 566 algorithm (like [26,32 35]), tries to map most frequent 567 and most critical communications in such a way as to min- 568 imise the physical distance between the source and destina- 569 tion nodes. This leads to mapped architectures which seems 570 to mimic a kind of small-world phenomenon [36] in which 571 there are many communications following short paths and 572 few communications which require long paths. Second, the 573 long distance communications, which determine the com- 574 plexity in building the ASCDG can be treated in a more 575 practical way. That is, for these long distance communica- 576 tions, one can consider a subset of all the minimal paths. 577 To limit the number of minimal paths to be annotated, 578 an idea could be to fix a budget of minimal paths that 579 can be used for any communication. In this way, the com- 580 plexity can be tuned by simply modifying the budget, which 581 can be considered as a user defined parameter Hardware implications of APSRA 583 There are two main ways to implement a routing algo- 584 rithm depending on the way the underlying routing func- 585 tion is implemented. The first way is implementing the 586 routing function in hardware logic. In this case an FSM 587 can be used to compute the set of admissible output ports 588 based on the current node address, the destination address 589 and some status information stored in the router. For sim- 590 ple routing functions, this results in small and fast routers. 591 This method has been used by several NoC proposals 592 [4,30]. 593 The second way to implement the routing function is to 594 use a routing table [31]. A schematic diagram of the archi- tecture of a table based router is shown in Fig. 7. The destination address is used to compute the entry s address of the table which encodes the set of admissible output ports where the message can be forwarded on. The main advantages of table-based routers are related to their flexibility and configurability characteristics, and in the possibility of implementing any complex routing function without any variation in cost, since the data stored in the table defines the routing function. The drawbacks are related to the facts that, in general, table-based implementations are costly, both in terms of silicon area and power dissipation, as compared to that using custom logic to implement the routing function. To cope with this drawback several techniques have been proposed [27,37,38]. All these techniques strive for the same objective, that is the compression of the routing table and the design of new router architectures which are able to work with the compressed tables. In [29] we showed that the cost overhead of a routing table implementation based on the compression technique and architecture presented in [27] represent only a small fraction of the overall router cost. In particular, for a lossless compression, we found that, the overhead over a XY router is about 10% (this overhead can be much more reduced whenever a small degradation in routing adaptivity is admitted). As regards energy cost, we determine the energy dissipated in a router by running Synopsys Design Power on the gate-level netlist of the router (including the FIFO buffers) when it is stimulated by different random input data streams. The average energy dissipated by a flit for one hop in the network was estimated to be nj, nj, and nj for XY-based, Chen and Chiubased, and table-based router respectively. Although energy consumption of table-based routers is higher than that exhibited by the other routers, it does not mean that overall NoC energy consumption is higher as well. It should be pointed out that flit switch is not the only source of power dissipation in NoCs. That is, even if accessing compressed routing tables implies an additional energy, this may be balanced by a reduced usage of FIFO buffers due to better avoidance of congestion. 5. Evaluation and comparison of algorithms 5.1. Adaptivity analysis One metric to characterize an adaptive routing algorithm is the degree of adaptiveness [7], which is essentially a measure of the number of paths the algorithm allows from the source to the destination. More precisely, it is defined as the average of the degree of adaptiveness of all communicating pairs. For a given source destination pair the degree of adaptiveness is defined as the ratio between the number of admissible paths and the total number of paths connecting the source node to the destination node. From a practical point of view, the degree of adaptiveness for a given routing algorithm R has been obtained Fig. 7. Schematic diagram of a table based router architecture

9 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx by averaging the degree of adaptiveness for each communi- 650 cation of the communication graph. The degree of adap- 651 tiveness of a communication c is computed as follows: a APSRA Chiu P s = M(src(c)); P d = M(dst(c)); n = NumberOfAdmissiblePaths(R, P s, P d ); m = NumberOfPaths(P s, P d ); return n/m; 657 where src(c) and dst(c) return the source and destination 658 task for the communication c; M is the mapping function; 659 NumberOfAdmissiblePaths(R, P s, P d ) returns the number 660 of paths R allows from P s to P d ; and NumberOfPaths(P s, 661 P d ) return the total number of paths from P s to P d. 662 Fig. 8 shows the degree of adaptiveness of both APSRA 663 and Chen and Chiu s routing algorithm for a 7 7 NoC 664 with a 2 2 region placed at the center of the NoC with 665 four access points (Center, 4AP) and one access point 666 (Center, 1AP) and at the bottom left corner of the NoC 667 with three access points (BL, 3AP) and one access point 668 (BL, 1AP). On average the degree of adaptiveness exhibited 669 by APSRA exceeds the 80% mark, which proves the effec- 670 tiveness of the approach. To compare the algorithms for 671 different region sizes, we define a new adaptivity measure 672 called relative adaptivity. It represents the ratio between 673 the total number of admitted paths when region is present 674 and the number of paths without region. Fig. 9a shows the 675 relative adaptivity for a region of varying size located at the 676 center of the NoC, whereas Fig. 9b shows this variation for 677 regions located at the bottom left corner of the NoC. For 678 both cases and for each region the access point is located 679 at the top right corner. As expected, the relative adaptivity 680 decreases with the increase in size of the region in general. 681 For regions located at the corner of the NoC there is a 682 minimum in relative adaptivity when region size is (or half the dimension of mesh NoC). If region size 684 increases further the relative adaptivity increases. This 685 effect is caused by the fact that a region located at the bot- 686 tom left corner of the NoC obstructs only communications 687 between nodes located at the north quadrant and east 688 quadrant of the region. The number of these nodes is equal Degree of adaptiveness for regions 3 3, 4 3, and 4 4. For this reason, whilst the number of paths without region decrease on average (because access point moves in direction of the center of the NoC), the number of paths remains fairly the same when region size increases from 3 3to4 3 and further to Simulation based evaluation In addition to analysis of adaptivity, we evaluated the two algorithms by using simulation models. For this purpose we developed a NoC simulator in SDL (Specification and Description Language). The simulator supports both regular as well as irregular mesh topologies. To understand the basic behaviour of the algorithms, our first simulations are performed with synthetic communication patterns, where a single region is placed at different positions in a 7 7 sized mesh. In a second set-up we use the communication pattern of a real multimedia application. The simulation model is in this case a 8 8 NoC with a total of 5 regions. The simulator implements wormhole switching with a packet size of 10 flits. Every router has two flit input and one flit output buffer. The router simultaneously routes packets destined to non-conflicting output ports. The minimal link delay is three cycles/flit and the maximum link bandwidth is 0.5 flits/ cycle (1 packet/20 cycles). Cores are modeled as traffic generators and resource network interface has output buffer large enough to keep packet generation un-affected by network Center. 4 AP Center. 1 AP BL. 3 AP BL. 1 AP APSRA Chiu Fig. 8. Adaptiveness vs. access points and placement of regions. Relative adaptiveness b Relative adaptiveness x1 2x1 2x2 3x2 3x3 APSRA Chiu 1x1 2x2 3x2 3x3 4x3 4x4 Fig. 9. Relative adaptiveness vs. size of region: (a) region in centre and (b) region in bottom left corner

10 10 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 716 conditions. The flits in a packet are sent in a burst mode at 717 the maximum link bandwidth and the gap between the 718 packets has a Poisson distribution with k = 10. Simulations 719 were carried out using Telelogic SDL simulation tool (Tau ). 721 The following parameters were used to study the perfor- 722 mance of a NoC platform. Performance values were col- 723 lected over 60,000 packets, after a warm-up session of ,000 packets. 725 Average Latency: The average transmission delay of a 726 packet from source (when the header leaves) to the des- 727 tination (when the tail has reached). 728 Blocked Routing Cycles/Router: The total number of 729 routing cycles when packets were blocked in a router Latency is measured to get an overall view about how 732 the performance in the network is affected by changes in 733 network configuration and packet injection rate. Blocked 734 Routing Cycles/Router can give information where the net- 735 work is most congested Results with synthetic traffic patterns 737 Destinations for generated packets are randomly 738 selected with hot-spot probability of 60% for region access 739 points. We compare APSRA and Chen and Chiu s algo- 740 rithm with region either in bottom left corner with three 741 access points (bl_ap3) or in centre of network with four 742 access points (c_ap4). Latency values are in this case aver- 743 aged over five random traffic scenarios to reduce the risk of 744 exceptional cases. 745 Communication traffic is classified into three types, 746 namely, as communication traffic to region, and as other 747 traffic where a resource other than the region is a destina- 748 tion, and as all communications which is the aggregate of 749 the first two types of traffic. 750 The first result shows average latency for all communi- 751 cations in the network, as depicted in Fig. 10. The lowest 752 latency values are obtained for APSRA with central region 753 (apsra_c_ap4). Second lowest latency values are obtained 754 with Chen and Chiu s algorithm and central region 755 (chiu_c_ap4). Latency (cycles) 53 apsra_bl_ap chiu_bl_ap3 apsra_c_ap4 chiu_c_ap4 Although APSRA clearly display lower latency for the identical case, this indicates that position of the region is of higher importance than which routing algorithm that is used. Looking how the algorithms perform when the region is placed at the bottom left corner, APSRA (apsra_bl_ap3) again shows lower latency than Chen and Chiu s algorithm (chiu_bl_ap3). The difference is not as large compared with the central region set-up, but seem to grow with increased load. In Fig. 11 we give average latency for traffic with destinations other than the region. The worst position from latency point of view, up to an injection rate of 5%, is with Chen and Chiu s algorithm and region in centre (chiu_- c_ap4). In this case all the other combinations provide similar latency values in this range. However, when injection rate is increased above 5%, Chen and Chiu s algorithm and region in corner position (chiu_bl_ap3) rapidly saturates. Next to saturate is APSRA with region in corner (apsra_bl_ap3). The best result from saturation point of view is when using APSRA and region in centre (apsra_c_ap4), although it has slightly higher latency at lower injection rates. In any case placing a region in centre seems to have less effect on tendency to create severe congestion. We also give results for traffic destined only to region (see Fig. 12). In this case also APSRA with central region shows the best performance results in terms of low latency. In this case, however Chen and Chiu s algorithm with cen Packet Injection Rate (% of LBW) Latency (cycles) apsra_bl_ap3 chiu_bl_ap3 apsra_c_ap4 chiu_c_ap4 Fig. 11. Average latency for communications destined outside region, with region in bottom left (bl) and centre (c), vs injection rate in % of link bandwidth (LBW). Latency (cycles) apsra_bl_ap3 chiu_bl_ap3 apsra_c_ap4 chiu_c_ap Packet Injection Rate (% of LBW) Fig. 10. Average latency for all communications with region placed in bottom left (bl) and centre (c), vs. packet injection rate in % of link bandwidth (LBW) Packet Injection Rate (% of LBW) Fig. 12. Average latency for communications destined to region in bottom left (bl) and centre (c), vs injection rate in % of link bandwidth (LBW).

11 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx tral region clearly gives better results than both algorithms 784 with region at bottom left position. Compared with the 785 traffic to region results, this is the case for all injection 786 rates. Worst performance is also in this measurement 787 shown by Chen and Chiu s algorithm with region in bot- 788 tom left corner. 789 Fig. 13 gives more detail about what causes the differ- 790 ence in latency values. The diagrams present values on 791 how many routing cycles the packets were blocked in differ- 792 ent routers. These results are from one of the simulations 793 with 10% packet injection rate, where the difference in 794 latency was very large. Note that the scale of blocked rout- 795 ing cycles is not the same in the two diagrams. 796 Fig. 13a and b reveals that APSRA algorithm does not 797 cause as much blockage as does Chiu algorithm. It can be 798 noted that also APSRA algorithm, in Fig. 13a, shows a 799 large number of blocked packets around the border of 800 the region. This increase results from packet routes which 801 have to circumvent the region to reach its destination. Still, 802 the distribution is fairly even and much smaller than for 803 Chen and Chiu s algorithm, in Fig. 13b. 804 Note that Chen and Chiu s algorithm results in more 805 blockages close to north and west border of the region. 806 The reason is that this path is highly utilized by the algo- 807 rithm in the procedures of routing around region border. 808 As a result these paths easily become congested, which results in more situations when packets get blocked. APS- RA on the other hand is not biased towards specific routes, and thus spreads the traffic more evenly around the border. As APSRA in many situations have several paths to select from it is also possible to avoid congested routes which further decreases the blockage Multimedia application As a real case study, we consider a multimedia application which implements a H.263 video decoder and a MP3 audio decoder [25]. Fig. 14 shows the communication graph, and the mapping of the tasks onto the NoC. The mapping has been obtained by using a modified version of the approach presented in [26]. A total of five regions are used in this case. Three big regions are used to host two memories and a buffer. Two small regions host the motion compensation (MC) block and the ADD block. We consider one access-point for each region. The location of the access-point is represented with a black dot. The remaining gray tiles of the NoC are supposed to communicate in a random fashion. The degree of adaptiveness exhibited by the routing algorithm generated by APSRA is In particular, the communications belonging to the audio/video decoder can be routed using a minimal fully adaptive routing algorithm. Only few restrictions on routing are applied to the random traffic. Fig. 15 shows the average latency for different packet injection rate exhibited by APSRA and Chen and Chiu s algorithm. As can be seen, APSRA algorithm has a performance advantage. The latency at lower load situations is a b MEM 1 HUFF1 VLD IDCT MC MEM 2 VLD ADD MEM 2 MEM 1 IQ IDCT IMDCT MC HUFF1 HUFF2 SUM ADD BUF BIT RES 1 BUF BIT RES 2 IQ HUFF 2 BIT RES 1 BIT RES IMDCT SUM Fig. 13. Blocked routing cycles/router with (a) APSRA algorithm and (b) Chen and Chiu s algorithm. Fig. 14. Multimedia application, (a) communication graph and (b) topology mapping.

12 12 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx Latency (cycles) clearly lower, and for higher packet injection rates APSRA 840 manages to keep the communication below saturation up 841 to a packet injection rate of 45%. For Chen and Chiu s 842 algorithm this state instead occurs at 30% Discussion on results 844 The simulation results show that APSRA has an overall 845 advantage in communication latency, for identical traffic 846 scenarios. This is probably an effect of its unbiased behav- 847 ior, as compared to Chen and Chui s algorithm, which has 848 tendency to create highly congested routes. In addition, the 849 higher adaptivity of the algorithm makes it possible to 850 avoid congested routes. This is especially shown in the 851 results of the traffic not destined to the region. In this case, 852 a large difference is shown between APSRA and Chen and 853 Chiu s algorithm for the region in the centre. Even though 854 the average distance for APSRA is slightly longer for a 855 region in the centre, as indicated by somewhat higher 856 latency at lower loads, APSRA manages to keep communi- 857 cation below saturation up to approximately 8%. For the 858 same scenario, Chen and Chiu s algorithm has significantly 859 higher latency. 860 Considering traffic to region, the latency is more domi- 861 nated by the distance from sources to the destinations, 862 which in this case is shorter with a centrally placed region. 863 Since traffic to the region has a probability of 60% this also 864 dominates the average latency when we consider all com- 865 munications case. A large difference can be identified 866 when comparing injection rates and saturation between 867 the synthetic and multimedia simulations. This can be 868 explained by two different properties regarding the models. 869 First, the number of communications is larger in the syn- 870 thetic simulations; on average every node generates traffic 871 to one other node. Second, the hot-spot traffic increases 872 the risk of high local traffic rates, which further increase 873 the risk of congestions Conclusions Chiu APSRA Packet Injection Rate (% of LBW) Fig. 15. Average latency for multimedia applicaton, vs injection rate in % of link bandwidth (LBW). 875 In this paper we have highlighted the importance of the 876 region concept in mesh topology NoC architecture. We 877 have also listed new issues which a designer will encounter while designing a heterogeneous mesh topology NoC system using multi-port or multi-access point cores. We presented and compared two deadlock free routing algorithms for mesh NoC with regions. Our analysis and simulation based evaluation demonstrate that minimal distance deadlock free algorithms designed using APSRA methodology out-performs the other algorithm borrowed from fault tolerant area in terms of adaptivity and latency. The area of a NoC router required by the APSRA based algorithm is expected to be larger than the router for the other algorithm. This is because APSRA requires tables (memory) within each router to store routing information, whereas the other algorithm can be implemented as an optimized FSM. However, routing table compression techniques can be used to improve the cost/performance tradeoff in table-based routers [27,38]. In [29], we have shown that re-configurability of routing tables can be used to enhance communication performance for applications in which communication patterns change during its execution. Future developments will mainly address the definition of design space exploration strategies to optimally determine region placement, shape, and number of access points. 7. Uncited reference [28] Q1 901 Acknowledgements We thank Prof. Petru Eles for valuable discussions and suggestions during the development of this research. The work reported in this paper was supported by the project, Specialization and Evaluation of Network on Chip Architectures for multi-media applications, funded by the Swedish K.K. Foundation. We are also thankful to the anonymous reviewers for their constructive comments which helped us to improve the manuscript. References [1] S. Kumar, A. Jantsch, J-P. Soininen, M. Forsell, M. Millberg, J. Öberg, K. Tiensyrjä, A. Hemani, A network on chip architecture and design methodology, in: Proceedings IEEE Annual Symposium on VLSI, Pittsburgh, PA, USA, April 2002, pp [2] W.J. Dally, B. Towles, Route Packets, Not wires: On-chip interconnection networks, in: Proceedings Design Automation Conference, Las Vegas, NV, June 2001, pp [3] P. Guerrier, A. Greiner, A generic architecture for on-chip packetswitched interconnections, in: Proceedings Design and Test in Europe, March 2000, pp [4] E. Bolotin, A. Morgenshtein, I. Cidon, R. Ginosar, A. Kolodny, Automatic hardware-efficient SoC integration by QoS network on chip, in: Proceedings IEEE International Conference on Electronics, Circuits and Systems, December 2004, pp [5] P.P. Pande, C. Grecu, A. Ivanov, R. Saleh, Design of a switch for network on chip applications, in: Proceedings International Symposium on Circuits and Systems (ISCAS), vol. 5, May 2003, pp [6] E. Fleury, P. Fraigniud, A General theory for deadlock avoidance in wormhole-routed networks, IEEE Transactions on Parallel and Distributed Systems 9 (7) (1998)

13 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx [7] J. Duato, S. Yalamanchili, L. Ni, Interconnection Networks: An 933 Engineering Approach, Morgan Kaufmann, [8] P. Vellanki, N. Banerjee, K.S. Chatha, Quality-of-service and error 935 control techniques for mesh-based network-on-chip architectures, 936 Integration VLSI Journal 38 (3) (2005) [9] L.M. Ni, P.K. McKinley, A survey of wormhole routing techniques in 938 direct networks, IEEE Computer 26 (1993) [10] C.J. Glass, L.M. Ni, The turn model for adaptive routing, Journal of 940 the Association for Computing Machinery 41 (5) (1994) [11] C. Neeb, N. Wehn, Designing efficient irregular networks for 942 heterogeneous systems-on-chip, in: EUROMICRO Conference on 943 Digital System Design: Architectures, Methods and Tools, August , pp [12] W.J. Dally, H. Aoki, Deadlock-free adaptive routing in multicom- 946 puter networks using virtual channels, IEEE Transactions on Parallel 947 and Distributed Systems 4 (4) (1993) [13] R.V. Boppana, S. Chalasani, Fault-tolerant wormhole routing 949 algorithms for mesh networks, IEEE Transactions on Computers (7) (1995) [14] J. Zhou, C.M. Lau, Fault-tolerant wormhole routing in 2D meshes, 952 in: Proceedings International Symposium on Parallel Architectures, 953 Algorithms and Networks, December 2000, pp [15] J. Wu, A fault-tolerant and deadlock-free routing protocol in 2D 955 meshes based on odd even turn model, IEEE Transactions on 956 Computers 52 (9) (2003) [16] K.-H. Chen, G.-M. Chiu, Fault-tolerant touting algorithm for meshes 958 without using virtual channels, Journal of Information Science and 959 Engineering 14 (4) (1998) [17] R. Holsmark, S. Kumar, Design issues and performance evaluation of 961 mesh NoC with regions, in: Proceedings Norchip Conference, Oulu, 962 Finland, November 2005, pp [18] A. Mejia, J. Flich, J. Duato, S.-A. Reinemo, T. Skeie, Segment-based 964 routing: An efficient fault-tolerant routing algorithm for meshes and 965 tori, in: Parallel and Distributed Processing Symposium, April [19] S. Murali, D. Atienza, L. Benini, G. De Micheli, A multi-path routing 967 strategy with guaranteed in-order packet delivery and fault tolerance 968 for networks on chip, in: Proceedings Design Automation Confer- 969 ence, San Francisco, California, USA, July 2006, pp [20] J. Duato, A new theory of deadlock-free adaptive routing in 971 wormhole networks, IEEE Transactions on Parallel and Distribuited 972 Systems 4 (12) (2003) [21] W.J. Dally, C. Seitz, Deadlock-free message routing in multiprocessor 974 interconnection networks, IEEE Transactions on Computers 36 (5) 975 (1987) [22] M. Palesi, R. Holsmark, S. Kumar, V. Catania, A methodology for 977 design of application specific deadlock-free routing algorithms for 978 NoC systems, in: Proceedings International Conference on Hard- 979 ware-software Codesign and System Synthesis, Seoul, Korea, Octo- 980 ber 2007, pp [23] S. Ishiwata et al., A Single-chip MPEG-2 codec based on custom- 982 izable media embedded processor, IEEE Journal of Solid-State 983 Circuits 38 (3) (2003) [24] G.-M. Chiu, The odd even turn model for adaptive routing, IEEE 985 Transactions on Parallel Distribuited Systems 11 (7) (2000) [25] K. Srinivasan, K.S. Chata, G. Konjevod, Linear-programming-based 987 techniques for synthesis of network-on-chip architectures, IEEE Trans- 988 actions on Very Large Scale Integration Systems 14 (4) (2006) [26] G. Ascia, V. Catania, M. Palesi, A multi-objective genetic approach 990 to mapping problem on network-on-chip, Journal of Universal 991 Computer Science 12 (4) (2006) [27] M. Palesi, S. Kumar, R. Holsmark, A method for router table 993 compression for application specific routing in mesh topology NoC 994 architectures, in: SAMOS VI: Embedded Computer Systems: Archi- 995 tectures, Modeling, and Simulation, Samos, Greece, July [28] R. Holsmark, M. Palesi, S. Kumar, Deadlock free routing algorithms 997 for mesh topology NoC systems with regions, in: EUROMICRO 998 Conference on Digital System Design: Architectures, Methods and 999 Tools, August 2006, pp [29] M. Palesi, S. Kumar, R. Holsmark, V. Catania, Exploiting communication concurrency for efficient deadlock free routing in reconfigurable NoC platforms. in: 14th Reconfigurable Architectures Workshop March 27 28, 2007, Long Beach, California, USA. [30] X. Wang, D.S.-Tortosa, T. Ahonen, Jari Nurmi. Asynchronous network node design for network-on-chip, in: International Symposium on Signals, Circuits and Systems, July 2005, pp [31] A.S. Vaidya, A. Sivasubramaniam, C.R. Das, LAPSES: A recipe for high performance adaptive router design, in: 5th International Symposium On High-Performance Computer Architecture, January 1999, pp [32] J.-M. Chang, M. Pedram, Codex-dp: Co-design of communicating systems using dynamic programming, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 19 (7) (2000) [33] T. Lei, S. Kumar, A two-step genetic algorithm for mapping task graphs to a network on chip architecture, in: EUROMICRO Symposium on Digital Systems Design, September [34] S. Murali, G. De Micheli, Bandwidth-constrained mapping of cores onto NoC architectures, design, automation, and test in Europe, February 2004, pp [35] J. Hu, R. Marculescu, Energy- and performance-aware mapping for regular NoC architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24 (4) (2005) [36] U.Y. Ogras, R. Marculescu, It s a small world after all: NoC performance optimization via long-range link insertion, IEEE Transactions on Very Large Scale Integration Systems 14 (7) (2006) [37] J. Flich, A. Mejia, P. Lopez, J. Duato, Region-based routing. An efficient routing mechanism to tackle unreliable hardware in newtork on chips, in; First IEEE/ACM International Symposium on Networks-on-Chip, May [38] E. Bolotin, I. Cidon, R. Ginosar, A. Kolodny, Routing table minimization for irregular mesh NoCs, Design Automation and Test in Europe, March [39] Richard Holsmark, Shashi Kumar, Corrections to Chen and Chiu s fault tolerant routing algorithm for mesh networks, Journal of Information Science and Engineering 23 (6) (2007). Rickard Holsmark is a Ph.D. student with the Embedded System Group at School of Engineering, Jönköping University, Sweden. His research is focused towards specialized architectures and routing algorithms for Networks on Chip. Other areas of interest are embedded systems in general, system level design and processor architectures. He received a Bachelor of Science degree (2001) in electronics, with specialization in microcontroller systems. After this he completed a Master of Science degree (2003) in electronics, with specialization in embedded systems. Both of these degrees where received at Jönköping University. Maurizio Palesi received the Dr. Eng. degree and the Ph.D. degree in computer engineering from Università di Catania, Italy, in 1999 and 2003 respectively. Since December 2003, he has held a research contract as Assistant Professor at the Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà di Ingegneria, Università di Catania. From January 2007 he is Associate Editor of VLSI Design Journal, Hindawi Publishing Corporation. His research focuses on Platform based system design, design space exploration, low-power techniques for embedded systems, and Networkon-Chip architectures

14 R. Holsmark et al. / Journal of Systems Architecture xxx (2007) xxx xxx 073 Shashi Kumar is a professor of Embedded Sys- 074 tems at School of Engineering, Jönköping Uni- 075 versity. His research interests include system-level 076 modeling and synthesis, parallel architectures and 077 algorithms, reconfigurable computing and heu- 078 ristic search algorithms. He was member of the 079 team which was the first to propose the idea of 080 packet switched communication for on-chip 081 communication and coined the term Network on 082 Chip (NoC) in Prof. Kumar has interest in 083 various aspects of NoC design including NoC topologies, QoS issues in NoC communication, NoC architectural mod- eling and evaluation, application specific NoC architecture design, mapping applications to NoC platforms and testing of NoC. He received B.Tech, M.Tech and PhD degrees from the Indian Institute of Technology Delhi in 1974, 1976 and 1985 respectively

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip Rickard Holsmark 1, Maurizio Palesi 2, Shashi Kumar 1 and Andres Mejia 3 1 Jönköping University, Sweden 2 University of Catania,

More information

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Usman Mazhar Mirza Master of Science Thesis 2011 ELECTRONICS Postadress: Besöksadress: Telefon: Box 1026

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1

Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1 Published in IET Computers & Digital Techniques Received on 6th July 2008 Revised on 2nd April 2009 In Special Issue on Networks on Chip ISSN 1751-8601 Bandwidth-aware routing algorithms for networks-on-chip

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Fault-Tolerant Routing Algorithm in Meshes with Solid Faults Jong-Hoon Youn Bella Bose Seungjin Park Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science Oregon State University

More information

Fault-Tolerant and Deadlock-Free Routing in 2-D Meshes Using Rectilinear-Monotone Polygonal Fault Blocks

Fault-Tolerant and Deadlock-Free Routing in 2-D Meshes Using Rectilinear-Monotone Polygonal Fault Blocks Fault-Tolerant and Deadlock-Free Routing in -D Meshes Using Rectilinear-Monotone Polygonal Fault Blocks Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults

Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Fault-Tolerant Wormhole Routing Algorithms in Meshes in the Presence of Concave Faults Seungjin Park Jong-Hoon Youn Bella Bose Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Application-Specific Routing Algorithm Selection Function Look-Ahead Traffic-aware Execution (LATEX) Algorithm Experimental

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Fault-adaptive routing

Fault-adaptive routing Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing

More information

A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model

A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model A Deterministic Fault-Tolerant and Deadlock-Free Routing Protocol in 2-D Meshes Based on Odd-Even Turn Model Jie Wu Dept. of Computer Science and Engineering Florida Atlantic University Boca Raton, FL

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract.

Fault-Tolerant Routing in Fault Blocks. Planarly Constructed. Dong Xiang, Jia-Guang Sun, Jie. and Krishnaiyan Thulasiraman. Abstract. Fault-Tolerant Routing in Fault Blocks Planarly Constructed Dong Xiang, Jia-Guang Sun, Jie and Krishnaiyan Thulasiraman Abstract A few faulty nodes can an n-dimensional mesh or torus network unsafe for

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers

A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers A Distributed Formation of Orthogonal Convex Polygons in Mesh-Connected Multicomputers Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 3343 Abstract The

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 1, JANUARY

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 1, JANUARY IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 1, JANUARY 2014 113 ZoneDefense: A Fault-Tolerant Routing for 2-D Meshes Without Virtual Channels Binzhang Fu, Member, IEEE,

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Announcements Feedback on your project proposals This week Scheduled extended 1 week Next week:

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE By HAIBO ZHU A thesis submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

Mapping of Real-time Applications on

Mapping of Real-time Applications on Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real

More information

MESH-CONNECTED networks have been widely used in

MESH-CONNECTED networks have been widely used in 620 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 5, MAY 2009 Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model Dong Xiang, Senior Member, IEEE, Yueli Zhang,

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Generic Methodologies for Deadlock-Free Routing

Generic Methodologies for Deadlock-Free Routing Generic Methodologies for Deadlock-Free Routing Hyunmin Park Dharma P. Agrawal Department of Computer Engineering Electrical & Computer Engineering, Box 7911 Myongji University North Carolina State University

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Mohammad Hossein Manshaei 1393

Mohammad Hossein Manshaei 1393 Mohammad Hossein Manshaei manshaei@gmail.com 1393 Voice and Video over IP Slides derived from those available on the Web site of the book Computer Networking, by Kurose and Ross, PEARSON 2 Multimedia networking:

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS*

SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* SOFTWARE BASED FAULT-TOLERANT OBLIVIOUS ROUTING IN PIPELINED NETWORKS* Young-Joo Suh, Binh Vien Dao, Jose Duato, and Sudhakar Yalamanchili Computer Systems Research Laboratory Facultad de Informatica School

More information

EECS 578 Interconnect Mini-project

EECS 578 Interconnect Mini-project EECS578 Bertacco Fall 2015 EECS 578 Interconnect Mini-project Assigned 09/17/15 (Thu) Due 10/02/15 (Fri) Introduction In this mini-project, you are asked to answer questions about issues relating to interconnect

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms

Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 9, NO. 6, JUNE 1998 535 Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms Rajendra V. Boppana, Member, IEEE, Suresh

More information

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks

Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message Dependent Deadlocks in on-chip Networks Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

A NEW DEADLOCK-FREE FAULT-TOLERANT ROUTING ALGORITHM FOR NOC INTERCONNECTIONS

A NEW DEADLOCK-FREE FAULT-TOLERANT ROUTING ALGORITHM FOR NOC INTERCONNECTIONS A NEW DEADLOCK-FREE FAULT-TOLERANT ROUTING ALGORITHM FOR NOC INTERCONNECTIONS Slaviša Jovanović, Camel Tanougast, Serge Weber Christophe Bobda Laboratoire d instrumentation électronique de Nancy - LIEN

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Computation of Multiple Node Disjoint Paths

Computation of Multiple Node Disjoint Paths Chapter 5 Computation of Multiple Node Disjoint Paths 5.1 Introduction In recent years, on demand routing protocols have attained more attention in mobile Ad Hoc networks as compared to other routing schemes

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes

On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes On Constructing the Minimum Orthogonal Convex Polygon in 2-D Faulty Meshes Jie Wu Department of Computer Science and Engineering Florida Atlantic University Boca Raton, FL 33431 E-mail: jie@cse.fau.edu

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,

More information

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN Comparative Analysis of Latency, Throughput and Network Power for West First, North Last and West First North Last Routing For 2D 4 X 4 Mesh Topology NoC Architecture Bhupendra Kumar Soni 1, Dr. Girish

More information

The final publication is available at

The final publication is available at Document downloaded from: http://hdl.handle.net/10251/82062 This paper must be cited as: Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Chapter 2 Designing Crossbar Based Systems

Chapter 2 Designing Crossbar Based Systems Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

On Packet Switched Networks for On-Chip Communication

On Packet Switched Networks for On-Chip Communication On Packet Switched Networks for On-Chip Communication Embedded Systems Group Department of Electronics and Computer Engineering School of Engineering, Jönköping University Jönköping 1 Outline : Part 1

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

Efficient Communication in Metacube: A New Interconnection Network

Efficient Communication in Metacube: A New Interconnection Network International Symposium on Parallel Architectures, Algorithms and Networks, Manila, Philippines, May 22, pp.165 170 Efficient Communication in Metacube: A New Interconnection Network Yamin Li and Shietung

More information

13 Sensor networks Gathering in an adversarial environment

13 Sensor networks Gathering in an adversarial environment 13 Sensor networks Wireless sensor systems have a broad range of civil and military applications such as controlling inventory in a warehouse or office complex, monitoring and disseminating traffic conditions,

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009 VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers

Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers Tsutomu YOSHINAGA, Hiroyuki HOSOGOSHI, Masahiro SOWA Graduate School of Information Systems, University of Electro-Communications,

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Study of Load Balancing Schemes over a Video on Demand System

Study of Load Balancing Schemes over a Video on Demand System Study of Load Balancing Schemes over a Video on Demand System Priyank Singhal Ashish Chhabria Nupur Bansal Nataasha Raul Research Scholar, Computer Department Abstract: Load balancing algorithms on Video

More information