Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1

Size: px
Start display at page:

Download "Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1"

Transcription

1 Published in IET Computers & Digital Techniques Received on 6th July 2008 Revised on 2nd April 2009 In Special Issue on Networks on Chip ISSN Bandwidth-aware routing algorithms for networks-on-chip platforms M. Palesi 1 S. Kumar 2 V. Catania 1 1 Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, University of Catania, Italy 2 Department of Electronics and Computer Engineering, School of Engineering, Jönköping University, Jönköping, Sweden mpalesi@diit.unict.it Abstract: General purpose routing algorithms for a network-on-chip (NoC) platform may not be able to provide sufficient performance for some communication intensive applications. This may be because of low adaptivity offered by a general purpose routing algorithm resulting in some links getting highly congested. In this study the authors demonstrate that it is possible to design highly efficient application-specific routing algorithms which distribute traffic more uniformly by using information regarding applications communication behaviour (communication topology and communication bandwidth). The authors use off-line analysis to estimate expected load on various links in the network. The result of this analysis is used along with the available routing adaptivity in each router to distribute less traffic to links and paths which are expected to be congested. The methodology for application-specific routing algorithms is extended to incorporate these features to design highly adaptive deadlock-free routing algorithms which also distribute traffic more uniformly and reduce network congestion. The authors discuss architectural implications and analyse area and power overheads of the proposed approach on the design of a table-based NoC router. 1 Introduction Network on chip (NoC) is likely to be used in highperformance multi-core embedded systems in a near future. Many factors affect the performance achieved by an application on an NoC platform. For applications that require intensive communication among cores, the main factor which affects the overall performance of an NoC is represented by its routing algorithm [1]. Traditionally, routing algorithms have been designed without any reference to the characteristics of the traffic which will stimulate the network. The main reason was that, in a general purpose domain, the communication traffic cannot be accurately characterised, thus the routing algorithms are designed to provide deadlock freedom under any type of traffic and give good average performance. As a consequence, the design of the routing algorithm conservatively assumes that all the network nodes may need to communicate with each other. However, in the application-specific domain, which characterises the area of embedded systems, we assume that an accurate characterisation of the communication traffic is possible [2, 3]. The embedded system designer has good knowledge of the application which will be mapped on the system. This knowledge opens new directions in system optimisation like, for instance, the customisation of the routing algorithm for a given application. Based on this, APSRA, a methodology to design applicationspecific routing algorithms for NoC systems was presented in [4]. However, the basic APSRA does not take into account the communication attributes like the communication bandwidth requirements of different communicating task pairs mapped on different network nodes. Thus, selection of the routing paths to be removed to restrict the routing function and to guarantee deadlock freeness, is carried out in a blind fashion. It is equivalent to assuming that all the communications have the same bandwidth requirements. Such unawareness may lead to a bad distribution of the traffic load over the network. This is particularly true when the range of the bandwidth requirements of different communications is large. Unfortunately, this is a very frequent case in real applications. In [5], for example, the range of communication IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

2 bandwidth requirements for a Video Object Plane decoder in a MPEG-4 decoder system spans from 16 to 500 MB/s. The performance of a routing algorithm designed using APSRA methodology will also be greatly affected by its selection function in the router. This function should dynamically choose one among multiple admissible output ports for a new packet. We propose a new strategy for load estimation and design of the selection function. We propose that the application s communication behaviour along with routing function (topology, admissible paths, communication bandwidth between pairs etc.) should be analysed off-line and selection probabilities should be assigned to each admissible output port for packet coming from a certain input port. The above two considerations motivate this work and our proposal for improvement of APSRA methodology. As the traffic characteristics of a communication node pair is generally different from that of another pair, they should be distinguished. For this reason, we believe that emphasising the role of communication bandwidth requirements during the design of the routing algorithm design adds a new degree of freedom in system performance optimisation. 2 Related work An adaptive routing algorithm can be seen as the cascade of two main blocks which implement an adaptive routing function and a selection function (a.k.a., selection policy or selection strategy), respectively. First, a routing function computes the set of admissible outputs channels towards which the packet can be forwarded to reach the destination. Then, a selection function, is used to select one output channel from the set of admissible output channels depending on dynamic network conditions and/or locally stored information. Both the blocks have an important impact on overall network cost and performance and will constitute the topic of this paper. Regarding routing functions, many proposals for wormholeswitched networks have been presented in the literature [6 10]. Glass and Ni in [7] propose a turn model for designing wormhole routing algorithms for mesh and hypercube topology networks that are deadlock and livelock free. This model has been later utilised by Chiu [10] to develop the Odd Even adaptive routing algorithm for meshes without virtual channels. In comparison with the turn model, the degree of routing adaptiveness provided by the model is more even for different source destination pairs. Murali et al. [11] present a methodology to design application-specific NoCs using floorplan information. The routing function is designed by using the turn prohibition algorithm presented by Starobinksi et al. [12]. In the Starobinski s approach it is assumed that all the nodes of the network communicate with each other but this assumption is far away from the reality especially if we consider as a scenario of an heterogeneous system-on-chip implementing a specific application. Another application-specific design methodology for NoC systems is presented by Srinivasan et al. [13] where virtual channels are used to deal with deadlocks. An application-specific routing algorithm named APSRA has been proposed by Palesi et al. [4]. APSRA exploits communication information to maximise the adaptivity while ensuring deadlock-free routing for an application. The COmmunication Synthesis Infrastructure (COSI) framework [14] is used to define specific interconnect design flows for a variety of applications from chips to systems. Routing is modelled in a way that is very similar to the definition of routing tables in APSRA [4]. Moreover, as in APSRA, the definition of deadlock is based on the channel dependency graph. Our current work extends APSRA methodology to achieve multiple objectives of maximising adaptivity and distributing traffic more uniformly over the network. As regards selection functions, in [15], Schwiebert and Bell presented a detailed simulation study of various selection functions for several fully adaptive wormhole routing algorithms for 2D meshes. The obtained results show that the choice of selection function has a significant effect on the average message latency and saturation behaviour. Similar conclusions have been drawn by Feng and Shin [16]. An analysis of several selection functions in order to evaluate their influence on network performance has been carried out by Martinez et al. [17]. Improvement in network throughput (up to 10%) and in latency when network is close to saturation (up to 40%) has been observed. Hu and Marculescu [18] propose a routing scheme called DyAD which combines the advantages of both deterministic and adaptive routing schemes. The router works in deterministic mode when the network is not congested, and switches to an adaptive mode when the network becomes congested. In [19] Ye et al. present a contention-look-ahead on-chip routing scheme that is similar to [20]. It is a non-minimal routing in the sense that based on the value of two delay penalty indices the router chooses whether to send the packet towards a profitable route (minimal route) or a misroute (non-minimal route). The proposed approach has not been proved to be deadlock free. Differently from the other approaches which focus on output selection, in [21] the authors investigate the impact of input selection and present a contention-aware input selection technique that improves the routing efficiency. The concept of neighbours-on-path has been defined by Ascia et al. [22] to design a new selection policy which takes decision based on information deriving from the status of nodes belonging to the admissible paths from the current node. There is an abundance of work on path selection with bandwidth and latency awareness [23, 24]. Extensive research in these topics has been developed in the context of telecommunication and data networks. To the best of our knowledge, bandwidth-aware routing algorithms is a topic that has been left largely untouched in the context of on-chip interconnection networks. Except APSRA none of the aforementioned works exploit application information to optimise the routing algorithms. 414 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

3 Although APSRA uses communication topology and communication concurrency information, there are other important information features that could be exploited to improve the effectiveness of a routing algorithm. Communication bandwidth is one of that. General routing algorithms assume that all the communications are characterised by the same bandwidth requirements. This behaviour is rarely observed in real applications. For instance, looking at the task graph of the multimedia application [25] shown in Fig. 7, communication bandwidth requirements ranges from 10 to 500 MB/s. To the best of our knowledge, there are no contributions aimed at improving performance of routing algorithm by exploiting communication bandwidth information. This paper contributes in this direction presenting a methodology to design application specific and bandwidthaware routing functions along with a novel selection policy. This paper is an extension of [26]. Extensions include power, area and timing analysis of the router implementing the proposed routing scheme; delay, throughput and energy analysis of the NoC; and both an informal and a formal description of the methodology. 3 Terminology and problem formulation Simply stated, for a given application and a given network topology, the goal is to generate a routing algorithm which is strongly adaptive and spreads the traffic over the network in such a way that the communication traffic of any link will not exceed its capacity (maximum sustainable bandwidth). To formulate the problem more formally, we borrow the following terms from [4]. The communication graph, CG ¼ G(T, C), is a directed graph, where T is the set of tasks and C is the set of communications. Each communication c i, j ¼ (t i, t j ) [ C connects task t i [ T to task t j [ T. For a communication c [ C, the function B(c) returns the bandwidth requirement, that is the minimum bandwidth that should be allocated by the network in order to meet the performance constraints for communication c. The topology graph, TG ¼ G(N, L), is a directed graph which models the network topology. N is the set of network nodes, and L is the set of network channels. Channel l i, j ¼ (n i, n j ) connects node n i [ N to node n j [ N. Given a channel l [ L, the function Cap(l ) returns its capacity. The mapping function, M:T! N, maps tasks to network nodes [e.g. if M(t i ) ¼ n j then task t i is mapped on node n j of the network]. 3.1 Link load estimation As we are dealing with adaptive routing, the required bandwidth for communication c is split over multiple paths Figure 1 Effective bandwidth for a communication from node n s to node n d at 100 MB/s assuming a fully adaptive minimal routing that the routing function allows for c. For the sake of example, consider Fig. 1 which shows a 4 2 mesh-based network topology. Let us suppose that communication c ¼ (t s, t d ) requires a bandwidth of 100 MB/s and that the routing function allows all the minimal paths from node n s ¼ M(t s ) to node n d ¼ M(t d ) (four paths in total). The load is distributed over the paths as shown in Fig. 1 which reports, for each network channel, the effective bandwidth (or effective load) (EB) and the total number of paths containing that channel. Formally, the effective bandwidth of a channel l [ L because of a communication c [ C can be computed as jpt(c, l)j EB(c, l) ¼ B(c) jp(c)j where P(c) denotes the set of minimal paths admitted by the routing function for communication c, and PT(c, l) ¼ {P [ P(c):l [ P} is the pass through link set, that is the set of paths of c which contain the link l. Finally, we indicate with AB(l) the aggregate bandwidth of l which is computed as AB(l) ¼ X c[c EB(c, l) Using these definitions, the bandwidth-aware routing algorithm problem should meet the following constraint. Given a communication graph CG, a topology graph TG and a mapping function M, find a routing function R which is deadlock free and such that 8l [ L ) AB(l) Cap(l) (1) that is, the communication load of any channel, l, must not exceed its capacity Cap(l). 4 The proposed methodology In this section we provide a high-level overview of the proposed methodology and we discuss about the assumptions made and its limitations. IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

4 4.1 Overview An overview of the proposed methodology is shown in Fig. 2. The application is modelled by means of a communication graph. The communication graph together with the topology graph and a mapping function, which defines where each task is mapped on the NoC, represent the inputs of the proposed methodology. This information is used to build the application-specific channel dependency graph (ASCDG) [4]. If it contains cycles, they are iteratively broken by removing application-specific dependencies selected by means of a procedure that will be discussed in Section 5.1. The heuristic behind such a procedure is to assign more adaptivity to communications characterised by higher communication bandwidth requirements. As soon as all the cycles have been removed the routing function is deadlock free. Then, a link load analysis is performed to identify links in which aggregated bandwidth exceeds the link capacity. In this case a load balancing procedure, which will be described in Section 5.2, is used to selectively remove routing paths and to reduce the aggregated bandwidth on overloaded links. At the same time, it tries to allocate alternative routing paths in such a way that load is distributed almost equally among links. As a result a new deadlock-free routing function is obtained. Finally, a set of selection probabilities, which will be used by the selection policy described in Section 5.3, are computed. 4.2 Assumptions and scope In this work two important issues are not covered. The first is related to the way in which communications characteristics inducted by the application are modelled, and the second concerns the out-of-order delivery problem which characterises any adaptive routing algorithm. Figure 2 Block diagram of the proposed design flow 416 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

5 In the overview of the proposed methodology we assumed that the input application is already mapped and scheduled on the NoC platform before the design of routing algorithm starts. We also assumed that the communication volume between various tasks (and hence between various cores after the mapping step) is already determined using application profiling. It should be pointed out that, although the use of a bandwidth annotated communication graph (also known as the core graph or the communication task graph or the application characterisation graph) is generally used as entry point in many design methodologies [2, 3, 27], the application profiling task, which allows to determine communication volume between various tasks (even before the application and communication is mapped onto the platform), is still an open issue. In this context, the design space exploration tool from hartes could be useful for this purpose [28]. Another example is the task graph extraction (TGE) tool from Princeton [29]. The way in which communications are characterised in this work constitutes a simplification of the problem. In fact, a certain communication is characterised in terms of its maximum bandwidth requirements only without considering other important communication attributes like burstiness. This simplified model of communication behaviour results in a pessimistic analysis as we assume that a communication will demand the same bandwidth (the maximum bandwidth) for its entire lifetime. Further, the assumption that all communications are potentially concurrent results in exaggerated communication traffic density which may never happen if communication dependencies are taken into consideration. This may reduce the actual benefit of our schemes when applied to real applications in which degree of communication concurrency is less. The second open issue in this work is related to the routing algorithm. Although the routing algorithm we propose is multi-path, we do not take into consideration the mechanisms required for reordering packets at the destination. To cope with out-of-order packets delivery problem which characterises any adaptive routing algorithm, a possibility is to use the re-ordering mechanism at network reconvergent nodes proposed by Murali et al. [30]. In this case it needs to restrict the routing function in such a way as to remove all the intersecting paths for each source/destination pair. However, this will strongly impact the effectiveness of the proposed routing algorithm since one of its main benefit (high adaptivity) is reduced. However, in this work we distinguish between application performance from network performance, although the former depends on the later. Our focus is to improve network performance (network latency and throughput) and not application performance. That is, the proposed routing method, like other adaptive routing algorithms, is more useful to applications which can tolerate out of order delivery of packets. 5 Bandwidth-aware routing algorithm In this section we present our proposal for designing highly adaptive deadlock-free and bandwidth-aware routing algorithms. The section is organised in three subsections. The first subsection presents the strategy used to select and remove dependencies in the ASCDG which minimise the amount of bandwidth that must be redistributed among the remaining routing paths. The second subsection deals with the problem of checking and recovering when aggregated bandwidth on some network links exceeds link capacity. Finally, the last subsection describes a new selection function aimed at exploiting the peculiarities of the proposed routing function. 5.1 Bandwidth-aware routing function A cycle in the ASCDG is a succession of application-specific direct dependencies D ¼ {d 1, d 2,..., d n }, where a d [ D is a pair (l i, l j ) with l i, l j [ L. Here the problem is the selection of the best dependency to be removed to break the cycle D. Removing a dependency means removing all the paths which use that dependency. As soon as a path is removed, the fraction of bandwidth it transports must be redistributed between the remaining paths. For instance, suppose that the direct dependency d between channel l i and channel l j in Fig. 1 must be removed to break a cycle in the ASCDG. Removing d means prohibit path 3. As soon as path 3 is removed, the 25 MB/s transports are redistributed between path 1 and path 2 as shown in Fig. 3a. The idea we propose Figure 3 Bandwidth allocation a After removing channel dependency from l i to l j in Fig. 1 b After removing path 2 from a IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

6 in this paper is to choose and remove the dependency d which minimises the overhead of bandwidth that should be allocated to the remaining paths that do not use the dependency d. Formally, let us indicate with PT 2 (c, d) the pass through dependency set, that is the set of paths of c which use the dependency d ¼ (l 1, l 2 ) PT 2 (c, d) ¼ PT(c, l 1 ) > PT(c, l 2 ) Let d be an application-specific direct dependency. To remove d all the paths of any communication c which use d must be removed. For communication c the aggregated bandwidth to be redistributed is [B(c)=jP(c)j] jpt 2 (c, d)j. This bandwidth is redistributed between the jp(c)j jpt 2 (c, d)j remaining paths which do not use the dependency d. Based on this, the dependency to be removed is the d [ D such that the cost function cost(d) ¼ X B(c) jpt 2 (c, d)j c[c jp(c)j(jp(c)j jpt 2 (c, d)j) (2) is minimised. This ensures that the dependency which will be chosen for removal is such that the load on the paths which use that dependency is redistributed in such a way that it results in minimum increase in load on alternative paths. The cycles breaking algorithm is shown in Fig. 4. First, all the cycles of the ASCDG are detected by the function GetAllCycles and stored in the list cycles. Then, the so-called enumeration tree is built. The meaning of the enumeration tree is as follows. The order in which the cycles in ASCDG get treated determines both the overall adaptivity of the generated routing algorithm and the routability for all the communications. More precisely, with regard to the second point, certain cycle removal sequences might make some communications unroutable. In our implementation we used a back-tracking mechanism in which removing sequences are generated by performing a depth-first search of the solution space. Fig. 5 shows the enumeration tree generated by four cycles c 1, c 2, c 3, c 4. If, for instance, the removal sequence c 1! c 2 causes reachability problems then the sub-tree under c 1! c 2 is not considered for further analysis. The back-tracking mechanism returns to c 1. If the removal sequence c 1! c 3! c 2 results in a reachability problem then the back-tracking mechanism returns to c 3. If the removal sequence c 1! c 3! c 4! c 2 is Figure 4 Break cycles algorithm feasible (i.e. it does not result in reachability problems) the procedure terminates. The steps to break all the cycles of the ASCDG start from line 6 in Fig. 4. First, a backup of ASCDG, C and P is performed. Then, a cycle sequence cseq is extracted from the enumeration tree. The steps from lines 10 to 22 remove all the cycles in the same sequence as defined by cseq. For each of such cycles, only the channel dependencies that, if removed, do not cause reachability problems, are considered. This check is performed by assuring that there does not exist any communication whose all routing paths use such channel dependency (line 13). Thus, the channel dependency, d 0, which minimises the cost function (2) is selected and removed from the ASCDG (line 27). Then, all the routing paths which use d 0 are removed from the set of admissible paths (line 28). In case of reachability problems (line 24), the ASCDG, C and P are Figure 5 Enumeration of cycle sequences for four cycles 418 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

7 restored and the sub-tree of the enumeration tree whose root is the cycle whose removal has caused reachability problems is pruned (line 25). In this case a new iteration is performed with a new sequence of cycles (line 6). The overall time complexity of the algorithm is O(2 n ), where n depends on size of the rectangle containing the source and destination nodes (in the case of a mesh-based topology). The complexity of the proposed approach is not because of the heuristic itself but because of the computation of the ASCDG. The construction of the ASCDG involves the annotation of each minimum path between any source/ destination pair as defined in the communication graph. The basic assumption is that we start from a minimal fully adaptive routing algorithm. Thus, as the NoC size increases, the approach could become infeasible if some nodes located far from each other need to communicate. It should be pointed out, however, that this is the worst-case condition. In fact, any topological mapping algorithm tries to map most frequent and most critical communications in such a way as to minimise the physical distance between the source and destination nodes. For long-distance communications, one can consider a subset of all the minimal paths. A detailed analysis of the complexity of building the ASCDG has been presented in our previous work [4]. 5.2 Bandwidth reallocation Using the procedure discussed in the previous subsection, we obtain a routing function which is deadlock free (as the ASCDG is acyclic) and which generates a set of routing paths by providing more adaptivity to communications characterised by higher communication bandwidth. However, it is possible that the aggregate bandwidth on some network links exceeds the capacity of these links [i.e. condition (1) is not satisfied for some l [ L]. In this case some routing paths passing on that link, must be removed to reduce the aggregate bandwidth on that link down to the links capacity or, in a more general way, down to a user-defined value. For instance, looking again at Fig. 3a if either network links capacity is 50 MB/s or we want that links load do not exceed 50 MB/s, path 2 should be removed as shown in Fig. 3b. The proposed bandwidth reallocation algorithm is shown in Fig. 6. The input parameters are the set of network links, the set of communications, the set of admissible paths derived from the procedure described in the previous subsection and a threshold which defines the maximum bandwidth which has not to be exceeded in any network link. The output is the updated set of routing paths. The procedure starts by sorting network links in descending order based on their aggregate bandwidth. For each link l and for each communication c which has at least one path using l, and more than one path, two lists named paths2rem and paths2enr are generated as follows. paths2rem contains all the paths for c that should be removed as they use network links whose load exceeds the threshold. paths2enr contains Figure 6 Bandwidth reallocation algorithm those paths that can be used by other communications (i.e. can be enriched) as they use links whose load is below the threshold. Then, the list paths2rem is scanned and routing paths belonging to it are removed from P. Of course, removing a path causes the redistribution of the bandwidth allocated on it to the other paths belonging to paths2enr (see, for example, Fig. 3). Thus, the path elimination stops when there is at least one path in paths2enr that contains a link whose load exceeds the threshold. The above steps are repeated until the load on each link does not exceed the threshold. This procedure aborts if the path elimination step cannot be operated because of reachability issues which arises when it needs to remove a path which is unique for a certain communication. Although the presented algorithm assumes that all the network links have the same capacity, it is simple to generalise by replacing the scalar input parameter threshold with a function T :L! < which returns the bandwidth threshold associated to any channel l [ L. In this case, the condition AB(l). threshold in lines 5, 15 and 24 is replaced with AB(l ). T(l ). 5.3 Load balancing selection function To be effective, a good routing function must be coupled with an intelligent selection function. In fact, selection schemes IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

8 Figure 7 Communication graph of the MMS strongly affect the overall performance of any adaptive routing algorithm [15 17]. Generally, selection policies take decisions based on on-line measurement or estimation of traffic density. However, such estimation is a costly and difficult task. One of the ways to implement the selection function is to randomly distribute packets to admissible output ports. But this selection policy can lead to a large load imbalance on network links and in actual practice degrade network performance. Online information about traffic density and congestion on paths leading to the packet destination can be useful in selecting the appropriate admissible port. Most of the current approaches use local information regarding usage of buffer associated with an output port in the router (or neighbouring router in that direction) as a measure of communication traffic in that direction [18]. Some approaches use more elaborate look-ahead strategies for this purpose [22]. These selection strategies give better latency performance, especially when communication volume is high. The idea behind the proposed selection policy can be summarised by means of an example. Let us consider again Fig. 1. Let us suppose that all the four minimal paths from node n s to node n d are allowed by the routing function. When n s receives an header flit destined to n d, the routing function returns, as a set of admissible output channels, the set feast, southg. Now, let us suppose that the router in node n s is aware of the number of admissible paths to reach node n d starting from channel east and south, respectively. In our example, there are three paths from east and one path from south. So, the selection policy should use the east output channel with higher probability than south output channel (e.g. use east port with probability 0.75 and south port with probability 0.25). Formally, let j be a uniformly distributed random variable in the interval [0, 1], and {l 1, l 2,..., l n } the set of admissible output channels 420 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

9 returned by the routing function, then the selection function is defined as " # S(l 1, l 2,..., l n ) ¼ l i, i:j [ Xi 1 Pr{l j }, Xi Pr{l k } where Prflg indicates the probability to select output channel l, which is proportional to the number of admissible paths starting from l and that can be used to reach the destination. Of course, these probabilities are computed off-line and stored into the router as discussed in Section 7.1. j¼1 6 Evaluation and results We evaluate the proposed approach on both synthetic and real traffic scenarios. As synthetic traffic scenarios, we consider uniform, transpose, bit-reversal, shuffle, butterfly and hot-spot [31]. For them the bandwidth for each communicating pair has been randomly generated between 10 and 100 MB/s. As a more realistic communication scenario we consider a generic multimedia system (MMS) which includes an H.263 video encoder, an H.263 video decoder, an mp3 audio encoder and an mp3 audio decoder [25]. The communication graph of MMS is depicted in Fig. 7. It has been partitioned into 40 distinct tasks which have been mapped on a 5 5 mesh-based NoC using the mapping technique proposed in [32]. In the following we refer as APSRA the approach proposed in [4], with APSRA-BW the variant of APSRA using the heuristic presented in Section 5.1, and with APSRA-BWL the augmented version of APSRA-BW with the bandwidth reallocation procedure discussed in Section 5.2. We organise this section in two subsections. In the first one, we perform a bandwidth analysis aimed to show how the proposed approach allows to (i) uniformly distribute the communication bandwidth over network links, and (ii) avoid that bandwidth allocated on network links exceed link capacity. In the second subsection, we perform a dynamic analysis using a flit-accurate NoC simulator to show the performance improvements both in terms of delay and throughput. 6.1 Bandwidth analysis Let us start by showing the effectiveness of the proposed approach in uniformly distributing the traffic over the network. To do this, we use as a metric the standard deviation of the aggregate bandwidth in the network links. Using this metric, we compare APSRA, APSRA-BW and APSRA-BWL on a 8 8 mesh-based NoC under different traffic scenarios. For the APSRA-BWL, we fix the threshold to 90% of the maximum aggregate bandwidth when fully adaptive minimal routing is used. For each traffic, Table 1 reports the reduction in percentage of standard deviation of the aggregated bandwidth in k¼1 (3) Table 1 Percentage reduction of standard deviation of the aggregated bandwidth in network links Traffic APSRA-BW APSRA-BWL uniform bit-reversal butterfly 0 2 shuffle transpose1 0 2 transpose2 0 2 hot-spot_c hot-spot_tr 5 10 MMS 5 5 Average network links when both APSRA-BW and APSRA-BWL are used. As can be seen, the proposed heuristic to break cycles of the ASCDG allows to better distribute the bandwidth across the network. There are some situations, in which there is not any reduction in standard deviation. This is the case of transpose and butterfly traffic in which the ASCDG is acyclic and the cutting edge heuristic does not take place. On average the standard deviation of the aggregated bandwidth in network links decreases by 10%. An additional improvement of 2% is obtained when the bandwidth redistribution procedure is used. On the other side, as discussed in Section 5.2, the elimination of some routing paths operated by the bandwidth redistribution procedure, negatively affects the adaptiveness [10] of the routing function as shown in Fig. 8. It is interesting to observe that, for some traffics, like bit-reversal and shuffle, the adaptivity of APSRA-BW is higher than that of APSRA. Although the main objective of APSRA is the maximisation of adaptivity, the heuristic used to break cycles immediately stops when the first solution is found. At any rate, as can be observed, the average adaptivity still remains much higher as compared to that of odd even [10]. Fig. 9 shows the aggregate bandwidth of any link of a 9 9 mesh-based NoC under uniform traffic for both the routing algorithm generated by APSRA and by APSRA- BWL. The threshold has been fixed to 550 MB/s. As can be observed, when APSRA is used, the aggregate bandwidth in several link exceeds the threshold. If this threshold represents the network link capacity, such bandwidth overheads translate in local network congestion that, because of back pressure mechanism along with the wormhole switching techniques, propagates to the entire network causing a strong degradation of overall network performance. IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

10 Figure 8 Adaptivity exhibited by odd even and by routing algorithms generated by APSRA, APSRA-BW and APSRA-BWL under different traffic scenarios Figure 9 Aggregate bandwidth per link for a 9 9 mesh-based NoC under uniform traffic Routing algorithm used is generated by APSRA (top) and APSRA-BWL (bottom) Fig. 10 shows the absolute number of network links which exceed a given threshold when APSRA, APSRA-BW and APSRA-BWL are used. As can be observed, both APSRA-BW and APSRA-BWL allow to reduce the number of bandwidth violations as compared to APSRA. On average, the number of links exceeding the threshold when APSRA-BWL is used, is about the half of that obtained when APSRA is used. In particular, APSRA-BWL allows to meet bandwidth constraints which are almost 30 and 20% more stringent as compared to APSRA and APSRA-BW, respectively. 422 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

11 the load balancing selection policy (LB adaptive routing) have been used to distinguish the effect of the selection policy. Graph also reports results for deterministic XY routing and adaptive odd even routing. Once again, APSRA-BW and APSRA-BWL outperform APSRA. For a given average delay, APSRA-BW and APSRA-BWL are able to sustain higher bandwidth communication traffic than APSRA. Performance improvement over XY and odd even is even more evident. Figure 10 Absolute number of network links which exceed the threshold when APSRA, APSRA-BW and APSRA-BWL are used 6.2 Performance analysis Now, we evaluate the different routing algorithms in terms of average delay. Delay is defined as the time (in clock cycles) that elapses from the occurrence of a header flit injection into the network at the source node to the occurrence of a tail flit reception at the destination node. Noxim [33] is used as NoC simulation platform. Poisson packet injection distribution is used for synthetic traffic scenarios whereas self-similar packet injection distribution is used for MMS scenario (self-similar traffic has been observed in the bursty traffic between on-chip modules in typical multimedia applications [34]). Fig. 11 shows the average delay variation under uniform traffic for different ranges of communication bandwidth. That is, the bandwidth for each communicating pair has been randomly generated between the lower and upper bounds reported on the x-axes. In this experiment both the random selection policy (RND oblivious routing) and Figure 11 Average delay variation under uniform traffic for different ranges of communication bandwidth Fig. 12 shows average delay, throughput and energy for different packet injection rate (pir) factors under MMS traffic scenario. That is, starting from the communication graph of the application, we compute the pir of any communication c as pir(c) ¼ communication bandwidth of c packet size flit size clock frequency Thus, a point in the graph at a given pir factor p is computed simulating the network using a pir value of p pir(c) for a communication c. As can be observed, both the oblivious routings (odd even and APSRA with random selection function) and adaptive routings (APSRA-BW and APSRA-BWL with LB selection function) outperforms XY deterministic routing. For instance, looking at Figs. 12a and b, moving from XY to odd even the pir factor which saturates the network (a network is said to start saturating when increase in applied load does not result in linear increase in throughput [35]) increases by 33%. An additional improvement of 25% is obtained when application-specific routing is used. Finally, the use of an effective selection function like that proposed in this paper adds a further 10 and 40% of improvement when APSRA- BW and APSRA-BWL are considered, respectively. Fig. 12c shows the average energy per cycle per flit for different pir factors. We used the high-level energy estimation feature provided by noxim simulator to compute energy numbers [22]. Please note that the values after the saturation pir factor do not carry useful information as there the network is congested and flits into the network spend much of their travel time waiting into routers buffer. Thus, considering the range of pir factor where none of the algorithms are saturated, we observe that applicationspecific routing algorithms are more than 6 and 5% energy efficient than XY and odd even, respectively. If we restrict the analysis to APSRA, APSRA-BW and APSRA-BWL we observe that the proposed approach allows to reduce energy consumption by 6%. Taking APSRA as the baseline implementation, a summary of the improvements in terms of percentage increase in saturation pir factor, reduction of both average delay and energy consumption for different traffic scenarios is shown in Fig. 13. For all traffic scenarios but MMS the bandwidth for each communicating pair has been randomly generated between 10 and 100 MB/s. As can be observed, on average APSRA-BWL improves saturation point by IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

12 Figure 12 Simulation results for MMS traffic a Delay variation b Throughput variation c Energy variation Figure 13 Summary of the results taking APSRA as baseline a Percent increase in saturation pir b Percent reduction in average delay c Percent reduction in energy consumption 38%, reduces average delay by 43% and energy consumption by 4%. Finally, Fig. 14 shows the links utilisation under uniform traffic for APSRA and APSRA-BWL. Link utilisation value is discretised by three levels: low (white), medium (grey) and high (black). As can be observed, when APSRA-BWL is used links utilisation are more evenly distributed as compared to APSRA. For instance, looking at links utilisation when APSRA is used, there are several high utilised links (black) and many low utilised links (white). When APSRA-BWL is used, traffic flows responsible for the high utilisation of some links, are redistributed in favour of low utilised links. This is confirmed by the higher number of medium utilised links when APSRA-BWL is used. 7 Implications for router architecture In this section we present a router architecture design to support the proposed routing algorithm (routing function and selection function). 7.1 Router architecture Fig. 15 shows an architecture of the proposed router for the case of a mesh network topology and minimal routing. The top part of the picture shows the high level view of the router, whereas the bottom part shows the block diagrams of the modules which implement routing function and selection function associated to the west input port. The routing function is implemented by means of a routing table. The routing table is addressed by the destination id. An entry of the routing table contains two main fields: AOC and Pr. AOC encodes the set of admissible output channels that can be used to reach the current destination. If we consider the west input port, AOC is a four bit field whose bits indicate which of the output ports among north (N), east (E), south (S) and local (L) that can be used to reach the current destination. Pr encodes the probability used by the selection function as discussed in Section 5.3. The number of bits used to encode Pr determines the precision of the selection function. For instance, using three bits, eight probability levels are possible (from to 1). 424 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

13 Figure 14 Links utilisation under uniform traffic for APSRA and APSRA-BWL A possible implementation of the selection function reported in (3) is shown in the bottom right corner of Fig. 15. The connector labelled with 1, is used in several parts of the circuit. It is set when the routing function returns more than one admissible output channel. If it is zero, only one admissible output channel can be used. In this case, the selection logic is bypassed and clock gating is used to prevent the unnecessary activity of unused blocks. The DirEncS block converts the one-hot encode used at the input, to the encode of the selected output channel. If more than one (max two, because we are considering minimal routing) output channels can be used, a selection must be operated. The input pr is shifted left (multiplied) and compared with the current value stored in the linear feedback shift register (LFSR). If it is less, the first output channel is selected, otherwise the second one is selected. This selection is, of course, conditioned by the whrt word which encodes the reservation status of the output channels operated by wormhole switching technique. Precisely, suppose that north and east output channels are admissible and north should be selected after the comparator. However, if north output channel is reserved but east is not, east will be selected. This computation is performed by the DirEncM block which returns the encode of the selected output channel. 7.2 Area, timing and power analysis A router implementing deterministic XY routing algorithm, a router implementing adaptive odd even routing and twotable-based routers, one implementing a random selection policy (TB-RND) and the other implementing the load Figure 15 Block diagram of the router for a mesh network topology Top view (top), routing function and selection function associated to the west input port (bottom) IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

14 balancing selection policy (TB-LB), have been designed in VHDL and synthesised using Synopsys Design Compiler and mapped on a 90 nm technology library from TSMC. We considered 8 8 mesh topology networks, four-flits FIFO input buffers with flit size of 64 bits. The analysis is carried out at a granularity of the following main blocks. Arbiter: It is a general arbiter which manages situation where several packets simultaneously want to use the same output. In this case, arbitration between these has to be performed. In this implementation round-robin policy is used. XBar: It is a general 5 5 crossbar block which allows to simultaneously route non-conflicting packets. Input FIFOs: They are the FIFO buffers at the input of each router. For a mesh topology there are five FIFO buffer in total. We considered four entry FIFO buffer with an entry size of 64 bits (flit size). WHRT: This block implements the Wormhole Reservation Table which stores the output port selected by the routing algorithm associated to a given input port. Routing function: It is the block that gives the set of admissible outputs for the current node and a given destination. As we are considering mesh topologies and minimal routing, the maximum number of admissible outputs is two. Selection function: This block probabilistically selects one of the outputs from the set of admissible outputs returned by the routing function. Control: Control logic for sequencing various activities in the router. The effectiveness of the proposed selection policy depends on the number of bits used to encode the selection probabilities stored in the routing table (field Pr in the routing table shown in Fig. 15). We used three bits (i.e. eight probability levels from to 1) as no appreciable performance improvements have been observed by using more than three bits. For instance, Fig. 16 shows the average delay variation under MMS traffic when different discretisation levels are used to encode the selection probabilities Area analysis: Fig. 17a. shows the area breakdown for the considered routers. As expected, although a good percent of the area is due to FIFO buffers, control logic and arbiter, the impact of routing table is quite evident. The use of the LB selection function determines an area overhead on routing function block (i.e. routing table) and selection function block of 56 and 73% respectively. The Figure 16 Average delay variation under MMS traffic when different discretsation levels are used to encode selection probabilities overhead in the routing table is due to the additional field Pr which stores the selection probabilities used by the selection function. However, as input FIFO buffers dominate the area, globally this overhead translates to approx 8% of overall router area only Power analysis: Average power dissipation values of the main blocks composing the four routers are shown in Fig. 17b. Once again, the main contribution to power dissipation is due to FIFO buffers. The second highest contribution is due to the crossbar. Power dissipated by routing tables is 8 and 3% more than that dissipated by routing blocks implementing XY and odd even, respectively. With regard to the selection function, power dissipated by the LB selection function is about 80% more than that dissipated by a random selection function. In terms of global router power dissipation, routing table contributes by about 12% whereas LB selection function by 6%. It should be pointed out that both the routing table and the selection function block are active only when an header flit is processed. In fact, the above analysis is very conservative as it has been assumed that all the blocks in the router are characterised by the same utilisation factor (worst-case analysis). In practical cases, power contribution because of routing table and LB selection function are likely to be lower than that reported above Timing analysis: Fig. 17c shows the delay of the different blocks composing the four routers. We considered a five-stages pipeline implementation of the router with the following stages: FIFO, routing, selection, arbitration and crossbar. In this case the clock frequency is tuned over the FIFO stage except for the router implementing the odd even routing whose slowest stage is routing. The access to the routing table as well as the computation of the LB selection function do not affect the router clock frequency. 426 IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp & The Institution of Engineering and Technology 2009

15 Figure 17 Comparison between routers implementing XY routing, odd-even routing and table based router with random selection policy and balancing selection policy a Breakdown of area b Breakdown of power dissipation c Breakdown of delay d Area, timing and power values normalised with respect to a router implementing XY routing 7.3 Summary of the architectural implications Fig. 17d compares different routers in terms of area, delay and power. Values are normalised with respect to a router implementing XY routing. In terms of area, RT-LB is 8% bigger than a classic table-based router implementing a random selection function. Such difference is mainly because of the increase in width of routing table as it needs to store the selection probabilities required by the LB selection function. In terms of timing, the increase of routing table and the LB selection function do not impact the clock frequency of the router as the slowest pipestage continues to be the FIFO stage. Average power dissipation of RT-LB router is 14, 7 and 3% higher than that of XY, odd even and RT-RND router, respectively. However, as it has been shown in the experimental section, performance improvement obtained using the proposed routing and selection functions results in an overall saving in energy consumption. This is due to the fact that, although a RT- RND router is more power hungry than the other routers, a network built with RT-LB routers requires less cycles to drain a given volume of traffic with a consequent reduction in energy consumption. The average energy consumed in a period of time is the product between the average power dissipation and the duration of the period. 8 Conclusions An application-specific routing algorithm has a potential to provide substantially higher communication performance than general purpose routing algorithms. In this paper we have presented an extension to APSRA methodology to design highly adaptive bandwidth-aware applicationspecific deadlock-free routing algorithms for NoC platforms. The basic idea behind the approach is the exploitation of communication bandwidth information to customise the routing algorithm for a given application. The approach is divided into two phases. In the first phase, information regarding communication bandwidth required between a pair of cores is used in the heuristic while removing cycles in ASCDG to ensure deadlock freedom and deciding selection probabilities for various available paths for a communication. This helps the resulting routing algorithm to achieve high adaptivity along with spreading the traffic uniformly over the network links. In the second phase, the routing function is further restricted in an iterative manner to reduce loads on some overloaded network links. The approach has been evaluated on both synthetic and real traffic scenarios. The results obtained show that the routing algorithm generated by the proposed approach (i) is highly adaptive, (ii) reduces the variation of load in the network links and (iii) ensures that the link IET Comput. Digit. Tech., 2009, Vol. 3, Iss. 5, pp

Bandwidth Aware Routing Algorithms for Networks-on-Chip

Bandwidth Aware Routing Algorithms for Networks-on-Chip 1 Bandwidth Aware Routing Algorithms for Networks-on-Chip G. Longo a, S. Signorino a, M. Palesi a,, R. Holsmark b, S. Kumar b, and V. Catania a a Department of Computer Science and Telecommunications Engineering

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Application-Specific Routing Algorithm Selection Function Look-Ahead Traffic-aware Execution (LATEX) Algorithm Experimental

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip

HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip HiRA: A Methodology for Deadlock Free Routing in Hierarchical Networks on Chip Rickard Holsmark 1, Maurizio Palesi 2, Shashi Kumar 1 and Andres Mejia 3 1 Jönköping University, Sweden 2 University of Catania,

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Dong Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz School of Electronics and Computer Science University of Southampton

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

AS THE NUMBER of cores integrated into a systemon-chip

AS THE NUMBER of cores integrated into a systemon-chip 774 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 5, MAY 2011 Data Encoding Schemes in Networks on Chip Maurizio Palesi, Member, IEEE, Giuseppe Ascia, Fabrizio

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips : An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips Varsha Sharma, Rekha Agarwal Manoj S. Gaur, Vijay Laxmi, and Vineetha V. Computer Engineering Department, Malaviya

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Boosting the Performance of Myrinet Networks

Boosting the Performance of Myrinet Networks IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. Y, MONTH 22 1 Boosting the Performance of Myrinet Networks J. Flich, P. López, M. P. Malumbres, and J. Duato Abstract Networks of workstations

More information

Performance Analysis of a Minimal Adaptive Router

Performance Analysis of a Minimal Adaptive Router Performance Analysis of a Minimal Adaptive Router Thu Duc Nguyen and Lawrence Snyder Department of Computer Science and Engineering University of Washington, Seattle, WA 98195 In Proceedings of the 1994

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms Usman Mazhar Mirza Master of Science Thesis 2011 ELECTRONICS Postadress: Besöksadress: Telefon: Box 1026

More information

Chapter 7 CONCLUSION

Chapter 7 CONCLUSION 97 Chapter 7 CONCLUSION 7.1. Introduction A Mobile Ad-hoc Network (MANET) could be considered as network of mobile nodes which communicate with each other without any fixed infrastructure. The nodes in

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ E. Baydal, P. López and J. Duato Depto. Informática de Sistemas y Computadores Universidad Politécnica de Valencia, Camino

More information

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults 1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer

More information

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks

Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks 2080 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2012 Randomized Partially-Minimal Routing: Near-Optimal Oblivious Routing for 3-D Mesh Networks Rohit Sunkam

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

ACCORDING to the International Technology Roadmap

ACCORDING to the International Technology Roadmap 420 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 1, NO. 3, SEPTEMBER 2011 A Voltage-Frequency Island Aware Energy Optimization Framework for Networks-on-Chip Wooyoung Jang,

More information

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Noxim the NoC Simulator

Noxim the NoC Simulator Noxim the NoC Simulator User Guide http://www.noxim.org/ (C) 2005-2010 by the University of Catania Maurizio Palesi, PhD Email: mpalesi@diit.unict.it Home: http://www.diit.unict.it/users/mpalesi/ Davide

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

DUE to the increasing computing power of microprocessors

DUE to the increasing computing power of microprocessors IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 13, NO. 7, JULY 2002 693 Boosting the Performance of Myrinet Networks José Flich, Member, IEEE, Pedro López, M.P. Malumbres, Member, IEEE, and

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

in Oblivious Routing

in Oblivious Routing Static Virtual Channel Allocation in Oblivious Routing Keun Sup Shim, Myong Hyon Cho, Michel Kinsy, Tina Wen, Mieszko Lis G. Edward Suh (Cornell) Srinivas Devadas MIT Computer Science and Artificial Intelligence

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

Network-Adaptive Video Coding and Transmission

Network-Adaptive Video Coding and Transmission Header for SPIE use Network-Adaptive Video Coding and Transmission Kay Sripanidkulchai and Tsuhan Chen Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN Comparative Analysis of Latency, Throughput and Network Power for West First, North Last and West First North Last Routing For 2D 4 X 4 Mesh Topology NoC Architecture Bhupendra Kumar Soni 1, Dr. Girish

More information

Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks

Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks 60 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 3, NO. 1, MARCH 2002 Adaptations of the A* Algorithm for the Computation of Fastest Paths in Deterministic Discrete-Time Dynamic Networks

More information

Mapping of Real-time Applications on

Mapping of Real-time Applications on Mapping of Real-time Applications on Network-on-Chip based MPSOCS Paris Mesidis Submitted for the degree of Master of Science (By Research) The University of York, December 2011 Abstract Mapping of real

More information

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS 28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Fault-adaptive routing

Fault-adaptive routing Fault-adaptive routing Presenter: Zaheer Ahmed Supervisor: Adan Kohler Reviewers: Prof. Dr. M. Radetzki Prof. Dr. H.-J. Wunderlich Date: 30-June-2008 7/2/2009 Agenda Motivation Fundamentals of Routing

More information

Destination-Based Adaptive Routing on 2D Mesh Networks

Destination-Based Adaptive Routing on 2D Mesh Networks Destination-Based Adaptive Routing on 2D Mesh Networks Rohit Sunkam Ramanujam University of California, San Diego rsunkamr@ucsdedu Bill Lin University of California, San Diego billlin@eceucsdedu ABSTRACT

More information

Chapter 2 Designing Crossbar Based Systems

Chapter 2 Designing Crossbar Based Systems Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Conquering Memory Bandwidth Challenges in High-Performance SoCs

Conquering Memory Bandwidth Challenges in High-Performance SoCs Conquering Memory Bandwidth Challenges in High-Performance SoCs ABSTRACT High end System on Chip (SoC) architectures consist of tens of processing engines. In SoCs targeted at high performance computing

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

Computation of Multiple Node Disjoint Paths

Computation of Multiple Node Disjoint Paths Chapter 5 Computation of Multiple Node Disjoint Paths 5.1 Introduction In recent years, on demand routing protocols have attained more attention in mobile Ad Hoc networks as compared to other routing schemes

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Reconfigurable Routers for Low Power and High Performance Débora Matos, Student Member, IEEE, Caroline Concatto, Student Member, IEEE,

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture On-chip Networking Prof. Michel A. Kinsy Virtual Channel Router VC 0 Routing Computation Virtual Channel Allocator Switch Allocator Input Ports VC x VC 0 VC x It s a system

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing Jose Flich 1,PedroLópez 1, Manuel. P. Malumbres 1, José Duato 1,andTomRokicki 2 1 Dpto.

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information