Area and Power-efficient Innovative Network-on-Chip Architecture

Size: px

Start display at page:

Download "Area and Power-efficient Innovative Network-on-Chip Architecture"

Candice Miller
6 years ago
Views:

1 Area and Power-efficient Innovative Network-on-Chip Architecture Chifeng Wang, Wen-Hsiang Hu, Nader Bagherzadeh Dept. of Electrical Engineering and Computer Science University of California, Irvine Irvine, CA USA {chifengw, wenhsiah, Seung Eun Lee Integrated Platforms Research, Intel Labs Hillsboro OR Abstract This paper proposes a novel Network-on-Chip (NoC) architecture that not only enhances network transmission performance while maintaining implementation cost feasible, but also provides a power-efficient solution for interconnection network scenarios. Diagonally-linked mesh () NoC that uses wormhole packet switching technique implements a highperformance NoC platform to meet both cost and power consumption requirements. The proposed architecture uses an adaptive quasi-minimal routing algorithm so that can improve average latency and saturation traffic load owing to its flexibility and adaptiveness. In addition, implementation results show that employing diagonal links is a more area-efficient way for improving network performance than using large buffers. Simulation results also reveal that power consumption in networks outperforms traditional Mesh networks. Keywords- Network-on-Chip (NoC); interconnection network; power-efficient; power-optimization; area-efficient; system-on-chip (SoC) I. INTRODUCTION As semiconductor technology continues its phenomenal growth and follows the Moore s Law, the amount of computation power and storage that can be integrated on a chip increases. Several multi-core integrated circuit designs such as 64-core SoC and 8-core NoC architecture [1][2] have been proposed recently. NoC interconnection scheme has been proposed as a unified solution for the design of advanced process technology [3][4]. NoC uses a distributed control mechanism, resulting in a scalable interconnection network. The use of standardized sockets enables modular design and intellectual property (IP) reuse when the system predictability can be obtained by using guaranteed service feature of an NoC. Therefore, there is growing interest in NoC research [5][6] because of its impact on the next generation SoC designs. We have recently developed a multi-processor system platform called Network-based Processor Array () [7] in which processors are interconnected by using an on-chip twodimensional (2D) mesh network. is a deadlock-free and livelock-free network that implements wormhole packet switching technique and utilizes an adaptive minimal routing algorithm. To further improve the performance of architecture, diagonal links to the 2D mesh network are proposed because of the emergence of X-architecture routing technique in chip manufacturing [8][9]. The diagonal links not only reduce the distance between source and destination nodes but alleviate traffic congestion in the network so that the network performance is enhanced. We proposed an innovative NoC architecture referred to as : Diagonally-linked Mesh. Experimental results show that improves the average latency and the saturation traffic load in various sizes on mesh networks. In addition, implementation and power simulation results show that diagonal links are more areaefficient and power-efficient ways to enhance network transmission performance and make power consumption efficient. The rest of this paper is organized as follows. In Section 2, we discuss some of the background materials for NoC architecture and prior related research. Section 3 proposes NoC architecture and Section 4 shows the validation of new architecture with experimental results using different traffic patterns. In last section, brief statements conclude this paper. II. BACKGROUND In this section, we discuss the basic background for NoC architectures and provide a review of some related works in this field, as well as an overview of the platform. A. NoC Architecture The function of an on-chip network is to deliver messages from source node to destination node, and there are many design alternatives to accomplish this job. Depending on the application requirements, how to choose a suitable network architecture remains an open problem for research. Here we discuss the network properties that need to be considered when devising an NoC architecture for specific applications. 1) Switching Technique There are two major switching techniques: circuit switching and packet switching. Circuit switching establishes a link between source and destination node either virtually or physically before a message is being transferred. The link is held until all the data are transmitted. Major advantages of circuit switching are that there is no contention delay during message transmission and its behavior is more predictable, so circuit switching is usually employed when Quality of Service (QoS) is considered. Examples of using this technique are [1] and [11].

2 On the other hand, packet switching transfers messages on a per-hop basis. With packet switching, messages are divided into packets at the source node and then sent into a network. Packets move along a route determined by the routing algorithm and traverse through a series of network nodes and finally arrive at the destination node. Packet switching is utilized in most of NoC platforms because of its potential for providing simultaneous data communication between many source-destination pairs. It can be further classified into three classes: store and forward (SAF), virtual cut through (VCT), and wormhole switching. The most commonly used approach for an NoC architecture is wormhole switching because it only requires a buffer size of one transmission unit called flit so that the area cost of a router can be kept low. In contrast, SAF and VCT require a buffer size equivalent to the whole packet which prohibits their adoption. 2) Topology Development Topology defines how nodes are placed and connected, affecting the bandwidth and latency of a network. Many different topologies have been proposed [6], such as mesh, torus, mixed and custom topology, as shown in Fig. 1. Some researchers have proposed the application-specific topology that can offer superior performance while minimizing area and energy consumption [12]. The most common topologies are 2D mesh and torus due to their grid-type shapes and regular structure which are the most appropriate for the two dimensional layout on a chip. 3) Routing Policy Routing is the mechanism responsible for determining the path that a packet traverses from the source node to the destination node. Routing algorithms such as deterministic and adaptive ones have been proposed. With deterministic routing, the path between source-destination pair is fixed, regardless of the current state of the network. On the other hand, an adaptive routing algorithm takes the network state into account when deciding a route, resulting in variation of the routing path with time. For example, it may choose an alternative path when congestion occurs. This explains why it has the potential of supporting more traffic for the same network topology. Mesh Torus Mixed Custom Figure 1. NoC topologies PE Execution Unit Program RAM Data RAM Processor Core Figure 2. Example of 4x4 network B. System Platform platform is a scalable, flexible and reconfigurable multiprocessor platform which meets the high-performance and low-power requirements. implements the wormhole packet switching technique and is based on a 2D mesh topology as shown in Fig. 2. Each node in consists of a router and a local IP which can be a CPU, DSP, memory block, or application-specific logic. The router connects with its four neighboring routers via six bidirectional links. A key feature of the architecture is the use of two separate vertical links which are employed to construct a deadlock-free network [14]. The network is actually composed of two disjoint subnetworks. One sub-network is responsible for delivering eastbounded packets while the other one is for west-bounded packets. Therefore, there are no cycles in the resource dependence graph [15], preventing deadlocks from happening. This method reduces the design complexity of the router because there is no need for a deadlock aware routing algorithm. To increase network performance, utilizes an adaptive XY routing scheme. When an output port is congested or the output buffer is full, the router selects an alternative output port for packets. Therefore, the link utilization is balanced and network performance improves. III. ARCHITECTURE A. System Platform Description network is constructed by integrating diagonal links to, as presented in Fig. 3. Each node has corresponding 64-bit bidirectional links connecting with its neighbors. Additionally, there are three ports for connection with local PEs: IntR, IntL and Int. Fig. 4 depicts the input and output ports of router and router. network is composed of two sub-networks: E-subnet and W- subnet, represented in dashed arrows and solid arrows in Fig. 3, respectively. The E-subnet is responsible for transferring eastward packets while the W-subnet is responsible for westward traffic. When source PE starts packet transmission, it injects packets into the network via IntR or IntL ports, depending on the direction of destination PE. The IntR port is in charge of injecting packets into the E-subnet and the IntL port is for the W-subnet. Subsequently, packets traverse in one of the sub-networks to their destinations. When packets arrive at the destination node, they are ejected from the Int port. Router

3 a D S (a) Minimal routing (b) Quasi-minimal routing Figure 5. Example of routing path selection Figure 3. networks and links (a) router (b) router Figure 4. Ports description of router and router B. Packet Format With wormhole packet switching, packets are composed of 64-bit flits. We utilized the same packet format defined by [14]. There are four types of packets used for transmission. The single data transfer packet is in size of one flit and is used for transferring 32-bit data. The single command packet is for building control specific protocols between processor elements (PEs) or routers. Multiple data packets, which are used for transmitting large amount of data, have better performance in terms of communication overhead than the single data transmission. The block program/data transfer packets consist of header flit and several body flits which contains the actual program/data. The address of destination PE is represented in the X-dir (horizontal) field and Y-dir (vertical) field in a relative distance format. For instance, if the destination node is on the east side of the source node, the X-dir field has a positive value. The X-dir field has a negative value if the destination node is on the west side of the source node. Relative address representation helps reduce router design effort because the same router can be applied to all network nodes without any modification. We also incorporated the seq_num field in the packet for reordering out-of-order delivery. The single/block data transfer packet has the sourcepe_address field and the application-dependent data_id field in order for the destination PE to identify received data. C. Routing Algorithm Distributed adaptive routing algorithms are applied for. With distributed routing, the selection of the next hop is decided at the current node and the path selection is based on a quasi-minimal routing technique. As Fig. 5 illustrates, if a packet is transferred from node S to node D, there are three alternative paths: a, b, and c. Obviously, path_a is the shortest path. Our preliminary simulation shows that if a minimal routing algorithm is adopted then the shortest paths are always used then there will be severe congestion on diagonal links with low utilization on vertical and horizontal links. Thus, the network performance is impacted by traffic load unbalance. Not obeying a minimal routing strategy, the quasi-minimal routing technique relaxes the output port selection. In this example, it allows packets to take path_b or path_c instead based on path_a congestion status. Although the packet may traverse a longer path, this approach helps balance traffic loads and relieves congestion condition. In order to solve contention at an output port, we employed a fixed-priority arbitration scheme to simplify implementation efforts. In networks, the diagonal input ports are given higher priority than the horizontal and vertical input ports, and IntR and IntL have the lowest priority. IV. EVALUATION In order to demonstrate the router s performance and its feasibility for VLSI implementation, we address the methodology used to analyze the performance, area cost and power consumption of architecture and present the simulation results to validate the new architecture. A. Experimental Setup platform is developed by System-C based cycle accurate simulator called enoc. In the simulator, we can easily set up various network configurations such as network size, topology, buffer size, routing algorithm, priority scheme for router arbitration, and traffic patterns. Traffic generator produce four different synthetic traffic patterns used for measuring the performance. These traffic types include {uniform random, bit complement, bit reverse, matrix transpose} traffic patterns. These patterns define the spatial distribution of packets. In order to apply temporal distribution to transmitted packets, we adopted self-similar traffic generation techniques. Self-similar traffic has been discovered between on-chip modules in MPEG-2 video applications [16] and conventional computer networks [17]. Researchers [18] have shown that self-similar traffic can be generated by aggregating a large number of packet sources which exhibit a long-range dependence property. We used the modeling method proposed in [19] to produce the self-similar traffic. ON/OFF state is imposed on source node to control traffic generation during simulation time. The length of time a node spends in the ON or OFF states is determined by the Pareto distribution (F(x) = 1

4 x -α, 1<α<2). These equations use distributed values, α ON and α OFF to calculate ON and OFF times, as described bellow: 1 αon T = U (1) ON 1 αoff T = U (2) OFF where U is a uniformly distributed value in the range of (, 1], α ON = 1.9 and α OFF = 1.25 are set in the simulation. Although it is objective to simulate using synthetic traffic, practical cases are still essential to validate new schemes. Our simulations adopt realistic application patterns, SPLASH-2[2] benchmarks, which suit 7x7 networks and provide several different application behaviors which include {barnes, fft, lu, ocean, radix, raytrace, water-nsquared, water-spatial} traffic patterns. These applications stand for versatile applications that are commonly used for benchmarking. In this simulation effort, a standard interconnection network measurement setup was used [15]. After packets are generated, they are stored in an infinite queue at the source node, and wait until they are injected into the network. This mechanism referred to as the open-loop measurement configuration isolates the packet generation from the network behavior, i.e. the packet generation is independent of the network condition. Each simulation executes 1, clock cycles for warm-up and then continues for 1, cycles during which router performance and power consumption measurements are conducted. B. Performance Evaluation Latency and throughput are major performance evaluation criteria. Performance evaluation is based on transmission time and traffic load among source and destination pairs. Each flit in enoc is declared as a SystemC object that carries following time stamp and distance variables which are generation time (T g ), injection time (T i ), arrival time (T a ) and inter-node distance (D). Inter-node distance is represented in terms of the number of hops between source-destination pairs. With the information, transmission latency (T L ), queuing delay (T Q ), and blocking time (T ) B of each flit can be calculated by the following equations: T L = T a T g (3) T Q = T i T g (4) T B = T a T i ( D * T clk ) (5) where T L calculates the latency between source and destination node; T Q represents waiting time between generation time and injection time; T B shows blocking time when flits are stuck in buffers to wait for transmission and T clk means clock cycle time. Average inter-node distance correlates closely with transmission performance. Diagonal links are crucial for shortening distance travelled. Our quasi-minimal routing algorithm takes advantages of express alternative links so that the inter-node distance is reduced remarkably. We observed this reduction to be about quarter to half distance less than the original platform. Source and destination pairs located almost diagonally, as in matrix transpose traffic, have noticeable improvement. It is about half distance reduction, and in this case routing mechanisms utilize most of the diagonal links as a result of symmetry between source-destination pairs. Transmission time is composed of queuing time, transmit time and blocking time. From simulation results, we can analyze these time distribution under plenty of traffic patterns by various size networks. Results show that all of these delays are decreased more in other than in different traffic loads. Fig. 6 and Fig. 7 depict comparison of average latency for various traffic patterns and different size mesh networks. For both network sizes, average latency in networks dramatically outperforms networks. In particular, the 4x4 network under bit reverse and matrix transpose traffic, the latency in is nearly constant because the network is capable of resolving all routing resource contentions before network load gets saturated. That is, each source-destination pair can obtain an alternative path that is not occupied by other packets. Saturation load means the point where throughput no longer grows linearly with traffic load. Simulations compare various configurations, and all of the results give the same conclusion which indicates that networks could sustain higher traffic load than networks. Fig. 8 shows 8x8 mesh network saturation load. Each case shows that can effectively transmit more data among networks. We also observe that the improvement for 8x8 networks is more than 4x4 networks, which implies that has a greater impact on systems with more nodes. FIFO sizes also are evaluated in simulations. Results show that increasing FIFO size does not help much with supporting more traffic into networks to improve saturation load (a) 4x4 random (c) 4x4 bit reverse (b) 4x4 bit complement (d) 4x4 matrix transpose Figure 6. Average latency in 4x4 mesh networks (FIFO depth = 4)

5 (a) 8x8 random (b) 8x8 bit complement Figure 9. Block diagram of router. Saturation load (c) 8x8 bit reverse (d) 8x8 matrix transpose Figure 7. Average latency in 8x8 mesh networks (FIFO depth = 4) Gate counts gate counts gate counts power Dmesh power Power (mw) Random Bit-complement Bit-reverse Matrix transpose Traffic patterns FIFO depth(flits) Figure 8. Saturation load in 8x8 mesh networks (FIFO depth = 4) C. Implementation Cost Evaluation Feasibility is also evaluated by estimating hardware cost. router has been designed at Register-Transfer Level (RTL) in Verilog TM HDL. A logic description of router component has been obtained using Synopsys TM Design Compiler and TSMC TM 65nm CMOS generic process technology to perform logic synthesis and analyze hardware cost information. Various buffer size implementations were also studied. For routers, we referred to the architecture described in [7]. router block diagram is presented in Fig. 9. routers have three sub-routers for processing traffic in the E-subnet, W-subnet and Int output ports. There is a corresponding FIFO associated with each input port. Header processing unit (HPU) processes destination information from the header flit and arbiter logic (AL) is used to decide routing path, perform arbitration and manage the crossbar switch. Hardware synthesized frequency is set to 8 MHz. Area and power results are listed in Fig. 1. From the figure, we can observe that gate counts and power consumption of routers with FIFO depth of 4 or 8 flits are roughly equal to or less than those of router with FIFO depth of 8 or 16 flits Figure 1. Implementation cost and power comparison _FIFO8 _FIFO4 _FIFO16 _FIFO (a) 4x4 random _FIFO8 _FIFO (c) 4x4 bit reverse _FIFO16 _FIFO _FIFO8 _FIFO4 _FIFO16 _FIFO (b) 4x4 bit complement _FIFO8 _FIFO4 _FIFO16 _FIFO (d) 4x4 matrix transpose Figure 11. Average latency in 4x4 mesh networks with similar router cost

6 _FIFO8 _FIFO4 _FIFO16 _FIFO _FIFO8 _FIFO4 _FIFO16 _FIFO (a) 8x8 random (b) 8x8 bit complement _FIFO8 _FIFO16 _FIFO8 _FIFO16 _FIFO4 _FIFO8 _FIFO4 _FIFO Power (mw) 6 5 random bitcomplement matrixtranspose bitreverse Traffic patterns (a) Average latency.2.4 (c) 8x8 bit reverse.2.4 (d) 8x8 matrix transpose Figure 12. Average latency in 8x8 mesh networks with similar router cost Performance comparisons of different configurations are shown in Figures 11 and 12. It is evident that networks have shorter latency than networks with nearly equivalent hardware cost. For example, for 8x8 mesh networks in random traffic, with a FIFO depth of 4 flits has a shorter latency than with FIFO depth of 8 or 16 flits. All other configurations have similar results. The results also show that the improvements from diagonal links are more than those from larger buffers. From this point of view, is a more area-efficient architecture than other architectures by evaluating performance with the nearly equivalent hardware cost. D. Power Consumption Evaluation Statistical power model methodology is used in extracting the power model from and networks physical implementation [13][21]. Power analyses are achieved by Synopsys PrimeTime TM PX tool [22] which can analyze NoC platform power consumption in nanosecond details waveform. Furthermore, regression analysis is used to characterize related parameters and form the power equation. Based on this methodology, power model is produced and incorporated into our SystemC simulator so that we can estimate performance and power consumption simultaneously for each different benchmark. Power consumption for NoC platform mainly attributes to router and FIFO components. We further simplify NoC system power model as combination of router model and FIFO model listed as following: random bitcomplement matrixtranspose bitreverse Traffic patterns (b) Average power consumption Figure 13. Comparison of average latency and power consumption for synthetic traffic patterns Power (mw) SPLASH-2 benchmark (a) Average latency barnes fft lu ocean radix raytrace waternsquared waterspatial P router = α + α H Ψ H + α S Ψ S + α S Ψ S (6) where Ψ H is hamming distance of outgoing flits; Ψ S is the number of outgoing ports passing body flits; Ψ S is the number of state transitions of outgoing ports; α H, α S and α S are the regression coefficients for relative variable and α is the power term which is independent of the variables. 5 (b) Average power consumption barnes fft lu ocean radix raytrace waternsquared SPLASH-2 benchmark waterspatial Figure 14. Comparison of average latency and power consumption for SPLASH-2 traffic patterns

7 P FIFO = α + α HW Ψ HW + α HR Ψ HR (7) where Ψ HW is hamming distance of write flits; Ψ HR is hamming distance of read flits; α HW and α HR are the regression coefficients for relative variable and α is the power term which is independent of the variables. In order to evaluate power consumption comprehensively, both synthetic traffic patterns and real application traffic patterns are used to estimate and compare system power consumption for different router scenarios. Fig. 13 depicts performance and power consumption for synthetic traffic patterns in 8x8 mesh networks. The results reveal that networks shorten both latency and power consumption compare with networks. Although power consumption of individual routers in networks is slightly higher than routers, flits traverse less distance through networks and experience less FIFO access so that the overall power consumption outperforms networks based on traditional horizontal and vertical links. Latency in networks is about seventy-seven to ninety percent of networks which means flits in either traverse fewer hops or block less time than. The results also imply its alternative express links to diagonal direction shorten traveling distance between source and destination nodes. These characteristics also explain that system power consumption is about seventy to eighty percent compared with networks. For application patterns, SPLASH-2 benchmarks provide several different application cases to evaluate networks. Fig. 14 shows that latency in networks is about sixty to eighty percent of networks and power consumption is about eighty percent for different application patterns. Accelerating transmission tasks not only contributes to latency and power consumption improvement of tasks themselves but also enables system to enter power saving mode to lower overall system power consumption for the rest of data transmission time. V. CONCLUSION AND FUTURE WORK Flexible router design and adaptive routing algorithm not only effectively exploit area and power consumption, but also support more advanced features to accommodate various services using an NoC platform. We have developed an innovative NoC architecture and demonstrated its performance enhancement over traditional horizontal and vertical links networks. Experimental results reveal that networks outperform networks in transmission latency and saturation load. Implementation results clearly demonstrate that our approach is more area and power efficient than previously reported results. With alternative links employed between routers, we anticipate that has the potential of supporting better congestion control schemes, differentiated services and fault tolerance capability to accommodate more diverse services in the future. Besides performance, more features are needed to be employed in next generation NoC architecture. Our next steps to advanced NoC routers are to implement congestion control mechanisms and differentiated services so as to enhance transmission quality. We will also focus on the robustness of router architecture to tolerate unpredictable transmission errors or physical infrastructure failures in an NoC architecture. REFERENCES [1] S. Bell et al., "TILE64 Processor: A 64-Core SoC with Mesh Interconnect," Solid-State Circuits Conference, 28. Digest of Technical Papers. IEEE International, pp , 28. [2] S. Vangal et al., "An 8-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS," Solid-State Circuits Conference, 27. Digest of Technical Papers. IEEE International, pp , 27. [3] W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," Design Automation Conference, 21. Proceedings, pp , 21. [4] L. Benini and G. De Micheli, "Networks on chip: a new paradigm for systems on chip design," Design, Automation and Test in Europe Conference and Exhibition, 22. Proceedings, pp , 22. [5] T. Bjerregaard and S. Mahadevan, "A survey of research and practices of Network-on-chip," ACM Computing Surveys, vol. 38, pp. 1, 26. [6] E. Salminen, A. Kulmala, and Timo D. Hamalainen, "Survey of Network-on-Chip proposals," White Paper, OCP-IP, March 28. [7] J. H. Bahn, S. E. Lee, Y. S. Yang, J. Yang and N. Bagherzadeh, "On Design and Application Mapping of a Network-on-Chip(NoC) Architecture," Parallel Letters, vol. 18, pp , 28. [8] M. Igarashi et al., "A Diagonal-Interconnect Architecture and Its Application to RISC Core Design," IEIC Technical Report (Institute of Electronics, Information and Communication Engineers), vol. 12, pp , 22. [9] S. L. Teig, "The X architecture: Not your father's diagonal wiring," Proceedings of the 22 International Workshop on System-level Interconnect Prediction, pp [1] K. Chang, J. Shen and T. Chen, "Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for applicationspecific SOCs," in DAC '6: Proceedings of the 43rd Annual Conference on Design Automation, 26, pp [11] P. Marchal et al., "Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs," Hardware/Software Codesign and System Synthesis, 25. CODES+ISSS '5. Third IEEE/ACM/IFIP International Conference on, pp , 25. [12] J. Hu, Y. Deng and R. Marculescu, "System-level point-to-point communication synthesis using floorplanning information," in ASP- DAC '2: Proceedings of the 22 Conference on Asia South Pacific Design automation/vlsi Design, 22, pp [13] S. E. Lee and N. Bagherzadeh, "A Variable Frequency Link for a Power-Aware Network-on-Chip (NoC), " Integration, the VLSI Journal, Elsevier. 29. [14] J. H. Bahn, S. E. Lee and N. Bagherzadeh, "On Design and Analysis of a Feasible Network-on-Chip (NoC) Architecture," Information Technology, 27.ITNG'7.Fourth International Conference on, pp , 27. [15] W. J. Dally, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 24. [16] G. Varatkar and R. Marculescu, "Traffic analysis for on-chip networks design of multimedia applications," in DAC '2: Proceedings of the 39th Conference on Design Automation, 22, pp [17] W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson, "On the self-similar nature of Ethernet traffic (extended version)," Networking, IEEE/ACM Transactions on, vol. 2, pp. 1-15, [18] M. S. Taqqu, W. Willinger and R. Sherman, "Proof of a fundamental result in self-similar traffic modeling," SIGCOMM Comput. Commun. Rev., vol. 27, pp. 5-23, [19] D. R. Avresky, "Performance evaluation of the ServerNet(R) SAN under self-similar traffic," Parallel and Distributed, th International and 1th Symposium on Parallel and Distributed, IPPS/SPDP. Proceedings, pp , [2] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, in ISCA 95: Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp [21] S. E. Lee and N. Bagherzadeh, "A High-level Power Model for Network-on-Chip (NoC) Router, " Computer & Electrical Engineering, Elsevier, 29. [22] Synopsys, Synopsys design compiler, primetime px ( synopsys.com)

Deadlock-free XY-YX router for on-chip interconnection network

LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ