Area and Power-efficient Innovative Network-on-Chip Architecture

Size: px
Start display at page:

Download "Area and Power-efficient Innovative Network-on-Chip Architecture"

Transcription

1 Area and Power-efficient Innovative Network-on-Chip Architecture Chifeng Wang, Wen-Hsiang Hu, Nader Bagherzadeh Dept. of Electrical Engineering and Computer Science University of California, Irvine Irvine, CA USA {chifengw, wenhsiah, Seung Eun Lee Integrated Platforms Research, Intel Labs Hillsboro OR Abstract This paper proposes a novel Network-on-Chip (NoC) architecture that not only enhances network transmission performance while maintaining implementation cost feasible, but also provides a power-efficient solution for interconnection network scenarios. Diagonally-linked mesh () NoC that uses wormhole packet switching technique implements a highperformance NoC platform to meet both cost and power consumption requirements. The proposed architecture uses an adaptive quasi-minimal routing algorithm so that can improve average latency and saturation traffic load owing to its flexibility and adaptiveness. In addition, implementation results show that employing diagonal links is a more area-efficient way for improving network performance than using large buffers. Simulation results also reveal that power consumption in networks outperforms traditional Mesh networks. Keywords- Network-on-Chip (NoC); interconnection network; power-efficient; power-optimization; area-efficient; system-on-chip (SoC) I. INTRODUCTION As semiconductor technology continues its phenomenal growth and follows the Moore s Law, the amount of computation power and storage that can be integrated on a chip increases. Several multi-core integrated circuit designs such as 64-core SoC and 8-core NoC architecture [1][2] have been proposed recently. NoC interconnection scheme has been proposed as a unified solution for the design of advanced process technology [3][4]. NoC uses a distributed control mechanism, resulting in a scalable interconnection network. The use of standardized sockets enables modular design and intellectual property (IP) reuse when the system predictability can be obtained by using guaranteed service feature of an NoC. Therefore, there is growing interest in NoC research [5][6] because of its impact on the next generation SoC designs. We have recently developed a multi-processor system platform called Network-based Processor Array () [7] in which processors are interconnected by using an on-chip twodimensional (2D) mesh network. is a deadlock-free and livelock-free network that implements wormhole packet switching technique and utilizes an adaptive minimal routing algorithm. To further improve the performance of architecture, diagonal links to the 2D mesh network are proposed because of the emergence of X-architecture routing technique in chip manufacturing [8][9]. The diagonal links not only reduce the distance between source and destination nodes but alleviate traffic congestion in the network so that the network performance is enhanced. We proposed an innovative NoC architecture referred to as : Diagonally-linked Mesh. Experimental results show that improves the average latency and the saturation traffic load in various sizes on mesh networks. In addition, implementation and power simulation results show that diagonal links are more areaefficient and power-efficient ways to enhance network transmission performance and make power consumption efficient. The rest of this paper is organized as follows. In Section 2, we discuss some of the background materials for NoC architecture and prior related research. Section 3 proposes NoC architecture and Section 4 shows the validation of new architecture with experimental results using different traffic patterns. In last section, brief statements conclude this paper. II. BACKGROUND In this section, we discuss the basic background for NoC architectures and provide a review of some related works in this field, as well as an overview of the platform. A. NoC Architecture The function of an on-chip network is to deliver messages from source node to destination node, and there are many design alternatives to accomplish this job. Depending on the application requirements, how to choose a suitable network architecture remains an open problem for research. Here we discuss the network properties that need to be considered when devising an NoC architecture for specific applications. 1) Switching Technique There are two major switching techniques: circuit switching and packet switching. Circuit switching establishes a link between source and destination node either virtually or physically before a message is being transferred. The link is held until all the data are transmitted. Major advantages of circuit switching are that there is no contention delay during message transmission and its behavior is more predictable, so circuit switching is usually employed when Quality of Service (QoS) is considered. Examples of using this technique are [1] and [11].

2 On the other hand, packet switching transfers messages on a per-hop basis. With packet switching, messages are divided into packets at the source node and then sent into a network. Packets move along a route determined by the routing algorithm and traverse through a series of network nodes and finally arrive at the destination node. Packet switching is utilized in most of NoC platforms because of its potential for providing simultaneous data communication between many source-destination pairs. It can be further classified into three classes: store and forward (SAF), virtual cut through (VCT), and wormhole switching. The most commonly used approach for an NoC architecture is wormhole switching because it only requires a buffer size of one transmission unit called flit so that the area cost of a router can be kept low. In contrast, SAF and VCT require a buffer size equivalent to the whole packet which prohibits their adoption. 2) Topology Development Topology defines how nodes are placed and connected, affecting the bandwidth and latency of a network. Many different topologies have been proposed [6], such as mesh, torus, mixed and custom topology, as shown in Fig. 1. Some researchers have proposed the application-specific topology that can offer superior performance while minimizing area and energy consumption [12]. The most common topologies are 2D mesh and torus due to their grid-type shapes and regular structure which are the most appropriate for the two dimensional layout on a chip. 3) Routing Policy Routing is the mechanism responsible for determining the path that a packet traverses from the source node to the destination node. Routing algorithms such as deterministic and adaptive ones have been proposed. With deterministic routing, the path between source-destination pair is fixed, regardless of the current state of the network. On the other hand, an adaptive routing algorithm takes the network state into account when deciding a route, resulting in variation of the routing path with time. For example, it may choose an alternative path when congestion occurs. This explains why it has the potential of supporting more traffic for the same network topology. Mesh Torus Mixed Custom Figure 1. NoC topologies PE Execution Unit Program RAM Data RAM Processor Core Figure 2. Example of 4x4 network B. System Platform platform is a scalable, flexible and reconfigurable multiprocessor platform which meets the high-performance and low-power requirements. implements the wormhole packet switching technique and is based on a 2D mesh topology as shown in Fig. 2. Each node in consists of a router and a local IP which can be a CPU, DSP, memory block, or application-specific logic. The router connects with its four neighboring routers via six bidirectional links. A key feature of the architecture is the use of two separate vertical links which are employed to construct a deadlock-free network [14]. The network is actually composed of two disjoint subnetworks. One sub-network is responsible for delivering eastbounded packets while the other one is for west-bounded packets. Therefore, there are no cycles in the resource dependence graph [15], preventing deadlocks from happening. This method reduces the design complexity of the router because there is no need for a deadlock aware routing algorithm. To increase network performance, utilizes an adaptive XY routing scheme. When an output port is congested or the output buffer is full, the router selects an alternative output port for packets. Therefore, the link utilization is balanced and network performance improves. III. ARCHITECTURE A. System Platform Description network is constructed by integrating diagonal links to, as presented in Fig. 3. Each node has corresponding 64-bit bidirectional links connecting with its neighbors. Additionally, there are three ports for connection with local PEs: IntR, IntL and Int. Fig. 4 depicts the input and output ports of router and router. network is composed of two sub-networks: E-subnet and W- subnet, represented in dashed arrows and solid arrows in Fig. 3, respectively. The E-subnet is responsible for transferring eastward packets while the W-subnet is responsible for westward traffic. When source PE starts packet transmission, it injects packets into the network via IntR or IntL ports, depending on the direction of destination PE. The IntR port is in charge of injecting packets into the E-subnet and the IntL port is for the W-subnet. Subsequently, packets traverse in one of the sub-networks to their destinations. When packets arrive at the destination node, they are ejected from the Int port. Router

3 a D S (a) Minimal routing (b) Quasi-minimal routing Figure 5. Example of routing path selection Figure 3. networks and links (a) router (b) router Figure 4. Ports description of router and router B. Packet Format With wormhole packet switching, packets are composed of 64-bit flits. We utilized the same packet format defined by [14]. There are four types of packets used for transmission. The single data transfer packet is in size of one flit and is used for transferring 32-bit data. The single command packet is for building control specific protocols between processor elements (PEs) or routers. Multiple data packets, which are used for transmitting large amount of data, have better performance in terms of communication overhead than the single data transmission. The block program/data transfer packets consist of header flit and several body flits which contains the actual program/data. The address of destination PE is represented in the X-dir (horizontal) field and Y-dir (vertical) field in a relative distance format. For instance, if the destination node is on the east side of the source node, the X-dir field has a positive value. The X-dir field has a negative value if the destination node is on the west side of the source node. Relative address representation helps reduce router design effort because the same router can be applied to all network nodes without any modification. We also incorporated the seq_num field in the packet for reordering out-of-order delivery. The single/block data transfer packet has the sourcepe_address field and the application-dependent data_id field in order for the destination PE to identify received data. C. Routing Algorithm Distributed adaptive routing algorithms are applied for. With distributed routing, the selection of the next hop is decided at the current node and the path selection is based on a quasi-minimal routing technique. As Fig. 5 illustrates, if a packet is transferred from node S to node D, there are three alternative paths: a, b, and c. Obviously, path_a is the shortest path. Our preliminary simulation shows that if a minimal routing algorithm is adopted then the shortest paths are always used then there will be severe congestion on diagonal links with low utilization on vertical and horizontal links. Thus, the network performance is impacted by traffic load unbalance. Not obeying a minimal routing strategy, the quasi-minimal routing technique relaxes the output port selection. In this example, it allows packets to take path_b or path_c instead based on path_a congestion status. Although the packet may traverse a longer path, this approach helps balance traffic loads and relieves congestion condition. In order to solve contention at an output port, we employed a fixed-priority arbitration scheme to simplify implementation efforts. In networks, the diagonal input ports are given higher priority than the horizontal and vertical input ports, and IntR and IntL have the lowest priority. IV. EVALUATION In order to demonstrate the router s performance and its feasibility for VLSI implementation, we address the methodology used to analyze the performance, area cost and power consumption of architecture and present the simulation results to validate the new architecture. A. Experimental Setup platform is developed by System-C based cycle accurate simulator called enoc. In the simulator, we can easily set up various network configurations such as network size, topology, buffer size, routing algorithm, priority scheme for router arbitration, and traffic patterns. Traffic generator produce four different synthetic traffic patterns used for measuring the performance. These traffic types include {uniform random, bit complement, bit reverse, matrix transpose} traffic patterns. These patterns define the spatial distribution of packets. In order to apply temporal distribution to transmitted packets, we adopted self-similar traffic generation techniques. Self-similar traffic has been discovered between on-chip modules in MPEG-2 video applications [16] and conventional computer networks [17]. Researchers [18] have shown that self-similar traffic can be generated by aggregating a large number of packet sources which exhibit a long-range dependence property. We used the modeling method proposed in [19] to produce the self-similar traffic. ON/OFF state is imposed on source node to control traffic generation during simulation time. The length of time a node spends in the ON or OFF states is determined by the Pareto distribution (F(x) = 1

4 x -α, 1<α<2). These equations use distributed values, α ON and α OFF to calculate ON and OFF times, as described bellow: 1 αon T = U (1) ON 1 αoff T = U (2) OFF where U is a uniformly distributed value in the range of (, 1], α ON = 1.9 and α OFF = 1.25 are set in the simulation. Although it is objective to simulate using synthetic traffic, practical cases are still essential to validate new schemes. Our simulations adopt realistic application patterns, SPLASH-2[2] benchmarks, which suit 7x7 networks and provide several different application behaviors which include {barnes, fft, lu, ocean, radix, raytrace, water-nsquared, water-spatial} traffic patterns. These applications stand for versatile applications that are commonly used for benchmarking. In this simulation effort, a standard interconnection network measurement setup was used [15]. After packets are generated, they are stored in an infinite queue at the source node, and wait until they are injected into the network. This mechanism referred to as the open-loop measurement configuration isolates the packet generation from the network behavior, i.e. the packet generation is independent of the network condition. Each simulation executes 1, clock cycles for warm-up and then continues for 1, cycles during which router performance and power consumption measurements are conducted. B. Performance Evaluation Latency and throughput are major performance evaluation criteria. Performance evaluation is based on transmission time and traffic load among source and destination pairs. Each flit in enoc is declared as a SystemC object that carries following time stamp and distance variables which are generation time (T g ), injection time (T i ), arrival time (T a ) and inter-node distance (D). Inter-node distance is represented in terms of the number of hops between source-destination pairs. With the information, transmission latency (T L ), queuing delay (T Q ), and blocking time (T ) B of each flit can be calculated by the following equations: T L = T a T g (3) T Q = T i T g (4) T B = T a T i ( D * T clk ) (5) where T L calculates the latency between source and destination node; T Q represents waiting time between generation time and injection time; T B shows blocking time when flits are stuck in buffers to wait for transmission and T clk means clock cycle time. Average inter-node distance correlates closely with transmission performance. Diagonal links are crucial for shortening distance travelled. Our quasi-minimal routing algorithm takes advantages of express alternative links so that the inter-node distance is reduced remarkably. We observed this reduction to be about quarter to half distance less than the original platform. Source and destination pairs located almost diagonally, as in matrix transpose traffic, have noticeable improvement. It is about half distance reduction, and in this case routing mechanisms utilize most of the diagonal links as a result of symmetry between source-destination pairs. Transmission time is composed of queuing time, transmit time and blocking time. From simulation results, we can analyze these time distribution under plenty of traffic patterns by various size networks. Results show that all of these delays are decreased more in other than in different traffic loads. Fig. 6 and Fig. 7 depict comparison of average latency for various traffic patterns and different size mesh networks. For both network sizes, average latency in networks dramatically outperforms networks. In particular, the 4x4 network under bit reverse and matrix transpose traffic, the latency in is nearly constant because the network is capable of resolving all routing resource contentions before network load gets saturated. That is, each source-destination pair can obtain an alternative path that is not occupied by other packets. Saturation load means the point where throughput no longer grows linearly with traffic load. Simulations compare various configurations, and all of the results give the same conclusion which indicates that networks could sustain higher traffic load than networks. Fig. 8 shows 8x8 mesh network saturation load. Each case shows that can effectively transmit more data among networks. We also observe that the improvement for 8x8 networks is more than 4x4 networks, which implies that has a greater impact on systems with more nodes. FIFO sizes also are evaluated in simulations. Results show that increasing FIFO size does not help much with supporting more traffic into networks to improve saturation load (a) 4x4 random (c) 4x4 bit reverse (b) 4x4 bit complement (d) 4x4 matrix transpose Figure 6. Average latency in 4x4 mesh networks (FIFO depth = 4)

5 (a) 8x8 random (b) 8x8 bit complement Figure 9. Block diagram of router. Saturation load (c) 8x8 bit reverse (d) 8x8 matrix transpose Figure 7. Average latency in 8x8 mesh networks (FIFO depth = 4) Gate counts gate counts gate counts power Dmesh power Power (mw) Random Bit-complement Bit-reverse Matrix transpose Traffic patterns FIFO depth(flits) Figure 8. Saturation load in 8x8 mesh networks (FIFO depth = 4) C. Implementation Cost Evaluation Feasibility is also evaluated by estimating hardware cost. router has been designed at Register-Transfer Level (RTL) in Verilog TM HDL. A logic description of router component has been obtained using Synopsys TM Design Compiler and TSMC TM 65nm CMOS generic process technology to perform logic synthesis and analyze hardware cost information. Various buffer size implementations were also studied. For routers, we referred to the architecture described in [7]. router block diagram is presented in Fig. 9. routers have three sub-routers for processing traffic in the E-subnet, W-subnet and Int output ports. There is a corresponding FIFO associated with each input port. Header processing unit (HPU) processes destination information from the header flit and arbiter logic (AL) is used to decide routing path, perform arbitration and manage the crossbar switch. Hardware synthesized frequency is set to 8 MHz. Area and power results are listed in Fig. 1. From the figure, we can observe that gate counts and power consumption of routers with FIFO depth of 4 or 8 flits are roughly equal to or less than those of router with FIFO depth of 8 or 16 flits Figure 1. Implementation cost and power comparison _FIFO8 _FIFO4 _FIFO16 _FIFO (a) 4x4 random _FIFO8 _FIFO (c) 4x4 bit reverse _FIFO16 _FIFO _FIFO8 _FIFO4 _FIFO16 _FIFO (b) 4x4 bit complement _FIFO8 _FIFO4 _FIFO16 _FIFO (d) 4x4 matrix transpose Figure 11. Average latency in 4x4 mesh networks with similar router cost

6 _FIFO8 _FIFO4 _FIFO16 _FIFO _FIFO8 _FIFO4 _FIFO16 _FIFO (a) 8x8 random (b) 8x8 bit complement _FIFO8 _FIFO16 _FIFO8 _FIFO16 _FIFO4 _FIFO8 _FIFO4 _FIFO Power (mw) 6 5 random bitcomplement matrixtranspose bitreverse Traffic patterns (a) Average latency.2.4 (c) 8x8 bit reverse.2.4 (d) 8x8 matrix transpose Figure 12. Average latency in 8x8 mesh networks with similar router cost Performance comparisons of different configurations are shown in Figures 11 and 12. It is evident that networks have shorter latency than networks with nearly equivalent hardware cost. For example, for 8x8 mesh networks in random traffic, with a FIFO depth of 4 flits has a shorter latency than with FIFO depth of 8 or 16 flits. All other configurations have similar results. The results also show that the improvements from diagonal links are more than those from larger buffers. From this point of view, is a more area-efficient architecture than other architectures by evaluating performance with the nearly equivalent hardware cost. D. Power Consumption Evaluation Statistical power model methodology is used in extracting the power model from and networks physical implementation [13][21]. Power analyses are achieved by Synopsys PrimeTime TM PX tool [22] which can analyze NoC platform power consumption in nanosecond details waveform. Furthermore, regression analysis is used to characterize related parameters and form the power equation. Based on this methodology, power model is produced and incorporated into our SystemC simulator so that we can estimate performance and power consumption simultaneously for each different benchmark. Power consumption for NoC platform mainly attributes to router and FIFO components. We further simplify NoC system power model as combination of router model and FIFO model listed as following: random bitcomplement matrixtranspose bitreverse Traffic patterns (b) Average power consumption Figure 13. Comparison of average latency and power consumption for synthetic traffic patterns Power (mw) SPLASH-2 benchmark (a) Average latency barnes fft lu ocean radix raytrace waternsquared waterspatial P router = α + α H Ψ H + α S Ψ S + α S Ψ S (6) where Ψ H is hamming distance of outgoing flits; Ψ S is the number of outgoing ports passing body flits; Ψ S is the number of state transitions of outgoing ports; α H, α S and α S are the regression coefficients for relative variable and α is the power term which is independent of the variables. 5 (b) Average power consumption barnes fft lu ocean radix raytrace waternsquared SPLASH-2 benchmark waterspatial Figure 14. Comparison of average latency and power consumption for SPLASH-2 traffic patterns

7 P FIFO = α + α HW Ψ HW + α HR Ψ HR (7) where Ψ HW is hamming distance of write flits; Ψ HR is hamming distance of read flits; α HW and α HR are the regression coefficients for relative variable and α is the power term which is independent of the variables. In order to evaluate power consumption comprehensively, both synthetic traffic patterns and real application traffic patterns are used to estimate and compare system power consumption for different router scenarios. Fig. 13 depicts performance and power consumption for synthetic traffic patterns in 8x8 mesh networks. The results reveal that networks shorten both latency and power consumption compare with networks. Although power consumption of individual routers in networks is slightly higher than routers, flits traverse less distance through networks and experience less FIFO access so that the overall power consumption outperforms networks based on traditional horizontal and vertical links. Latency in networks is about seventy-seven to ninety percent of networks which means flits in either traverse fewer hops or block less time than. The results also imply its alternative express links to diagonal direction shorten traveling distance between source and destination nodes. These characteristics also explain that system power consumption is about seventy to eighty percent compared with networks. For application patterns, SPLASH-2 benchmarks provide several different application cases to evaluate networks. Fig. 14 shows that latency in networks is about sixty to eighty percent of networks and power consumption is about eighty percent for different application patterns. Accelerating transmission tasks not only contributes to latency and power consumption improvement of tasks themselves but also enables system to enter power saving mode to lower overall system power consumption for the rest of data transmission time. V. CONCLUSION AND FUTURE WORK Flexible router design and adaptive routing algorithm not only effectively exploit area and power consumption, but also support more advanced features to accommodate various services using an NoC platform. We have developed an innovative NoC architecture and demonstrated its performance enhancement over traditional horizontal and vertical links networks. Experimental results reveal that networks outperform networks in transmission latency and saturation load. Implementation results clearly demonstrate that our approach is more area and power efficient than previously reported results. With alternative links employed between routers, we anticipate that has the potential of supporting better congestion control schemes, differentiated services and fault tolerance capability to accommodate more diverse services in the future. Besides performance, more features are needed to be employed in next generation NoC architecture. Our next steps to advanced NoC routers are to implement congestion control mechanisms and differentiated services so as to enhance transmission quality. We will also focus on the robustness of router architecture to tolerate unpredictable transmission errors or physical infrastructure failures in an NoC architecture. REFERENCES [1] S. Bell et al., "TILE64 Processor: A 64-Core SoC with Mesh Interconnect," Solid-State Circuits Conference, 28. Digest of Technical Papers. IEEE International, pp , 28. [2] S. Vangal et al., "An 8-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS," Solid-State Circuits Conference, 27. Digest of Technical Papers. IEEE International, pp , 27. [3] W. J. Dally and B. Towles, "Route packets, not wires: on-chip interconnection networks," Design Automation Conference, 21. Proceedings, pp , 21. [4] L. Benini and G. De Micheli, "Networks on chip: a new paradigm for systems on chip design," Design, Automation and Test in Europe Conference and Exhibition, 22. Proceedings, pp , 22. [5] T. Bjerregaard and S. Mahadevan, "A survey of research and practices of Network-on-chip," ACM Computing Surveys, vol. 38, pp. 1, 26. [6] E. Salminen, A. Kulmala, and Timo D. Hamalainen, "Survey of Network-on-Chip proposals," White Paper, OCP-IP, March 28. [7] J. H. Bahn, S. E. Lee, Y. S. Yang, J. Yang and N. Bagherzadeh, "On Design and Application Mapping of a Network-on-Chip(NoC) Architecture," Parallel Letters, vol. 18, pp , 28. [8] M. Igarashi et al., "A Diagonal-Interconnect Architecture and Its Application to RISC Core Design," IEIC Technical Report (Institute of Electronics, Information and Communication Engineers), vol. 12, pp , 22. [9] S. L. Teig, "The X architecture: Not your father's diagonal wiring," Proceedings of the 22 International Workshop on System-level Interconnect Prediction, pp [1] K. Chang, J. Shen and T. Chen, "Evaluation and design trade-offs between circuit-switched and packet-switched NOCs for applicationspecific SOCs," in DAC '6: Proceedings of the 43rd Annual Conference on Design Automation, 26, pp [11] P. Marchal et al., "Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs," Hardware/Software Codesign and System Synthesis, 25. CODES+ISSS '5. Third IEEE/ACM/IFIP International Conference on, pp , 25. [12] J. Hu, Y. Deng and R. Marculescu, "System-level point-to-point communication synthesis using floorplanning information," in ASP- DAC '2: Proceedings of the 22 Conference on Asia South Pacific Design automation/vlsi Design, 22, pp [13] S. E. Lee and N. Bagherzadeh, "A Variable Frequency Link for a Power-Aware Network-on-Chip (NoC), " Integration, the VLSI Journal, Elsevier. 29. [14] J. H. Bahn, S. E. Lee and N. Bagherzadeh, "On Design and Analysis of a Feasible Network-on-Chip (NoC) Architecture," Information Technology, 27.ITNG'7.Fourth International Conference on, pp , 27. [15] W. J. Dally, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 24. [16] G. Varatkar and R. Marculescu, "Traffic analysis for on-chip networks design of multimedia applications," in DAC '2: Proceedings of the 39th Conference on Design Automation, 22, pp [17] W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson, "On the self-similar nature of Ethernet traffic (extended version)," Networking, IEEE/ACM Transactions on, vol. 2, pp. 1-15, [18] M. S. Taqqu, W. Willinger and R. Sherman, "Proof of a fundamental result in self-similar traffic modeling," SIGCOMM Comput. Commun. Rev., vol. 27, pp. 5-23, [19] D. R. Avresky, "Performance evaluation of the ServerNet(R) SAN under self-similar traffic," Parallel and Distributed, th International and 1th Symposium on Parallel and Distributed, IPPS/SPDP. Proceedings, pp , [2] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, in ISCA 95: Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp [21] S. E. Lee and N. Bagherzadeh, "A High-level Power Model for Network-on-Chip (NoC) Router, " Computer & Electrical Engineering, Elsevier, 29. [22] Synopsys, Synopsys design compiler, primetime px ( synopsys.com)

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh

Design of a router for network-on-chip. Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh 98 Int. J. High Performance Systems Architecture, Vol. 1, No. 2, 27 Design of a router for network-on-chip Jun Ho Bahn,* Seung Eun Lee and Nader Bagherzadeh Department of Electrical Engineering and Computer

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Design and Implementation of Multistage Interconnection Networks for SoC Networks

Design and Implementation of Multistage Interconnection Networks for SoC Networks International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.5, October 212 Design and Implementation of Multistage Interconnection Networks for SoC Networks Mahsa

More information

Congestion-Aware Network-on-Chip Router Architecture

Congestion-Aware Network-on-Chip Router Architecture Congestion-Aware Network-on-Chip Router Architecture Chifeng ang, en-hsiang Hu, Nader Bagherzadeh Dept. of lectrical ngineering and Computer Science University of California, Irvine Irvine, 92697 USA mail:

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential Distribution

NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential Distribution To appear in Int l Journal of Low Power Electronics, American Scientific Publishers, 2009 NED: A Novel Synthetic Traffic Pattern for Power/Performance Analysis of Network-on-chips Using Negative Exponential

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN Comparative Analysis of Latency, Throughput and Network Power for West First, North Last and West First North Last Routing For 2D 4 X 4 Mesh Topology NoC Architecture Bhupendra Kumar Soni 1, Dr. Girish

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

A Wireless Network-on-Chip Design for Multicore Platforms

A Wireless Network-on-Chip Design for Multicore Platforms 211 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing A Wireless Network-on-Chip Design for Multicore Platforms Chifeng Wang, Wen-Hsiang Hu, Nader Bagherzadeh

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Escape Path based Irregular Network-on-chip Simulation Framework

Escape Path based Irregular Network-on-chip Simulation Framework Escape Path based Irregular Network-on-chip Simulation Framework Naveen Choudhary College of technology and Engineering MPUAT Udaipur, India M. S. Gaur Malaviya National Institute of Technology Jaipur,

More information

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips

ERA: An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips : An Efficient Routing Algorithm for Power, Throughput and Latency in Network-on-Chips Varsha Sharma, Rekha Agarwal Manoj S. Gaur, Vijay Laxmi, and Vineetha V. Computer Engineering Department, Malaviya

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs

PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs PDA-HyPAR: Path-Diversity-Aware Hybrid Planar Adaptive Routing Algorithm for 3D NoCs Jindun Dai *1,2, Renjie Li 2, Xin Jiang 3, Takahiro Watanabe 2 1 Department of Electrical Engineering, Shanghai Jiao

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip 2010 25th International Symposium on Defect and Fault Tolerance in VLSI Systems A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip Min-Ju Chan and Chun-Lung Hsu Department of Electrical

More information

Efficient And Advance Routing Logic For Network On Chip

Efficient And Advance Routing Logic For Network On Chip RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations

Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations International Journal of Soft Computing and Engineering (IJSCE) Bursty Communication Performance Analysis of Network-on-Chip with Diverse Traffic Permutations Naveen Choudhary Abstract To satisfy the increasing

More information

Network-on-Chip Micro-Benchmarks

Network-on-Chip Micro-Benchmarks Network-on-Chip Micro-Benchmarks Zhonghai Lu *, Axel Jantsch *, Erno Salminen and Cristian Grecu * Royal Institute of Technology, Sweden Tampere University of Technology, Finland Abstract University of

More information

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP

AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP Rehan Maroofi, 1 V. N. Nitnaware, 2 and Dr. S. S. Limaye 3 1 Department of Electronics, Ramdeobaba Kamla Nehru College of Engg, Nagpur,

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

NOC Deadlock and Livelock

NOC Deadlock and Livelock NOC Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links Hoda Naghibi Jouybari College of Electrical Engineering, Iran University of Science and Technology, Tehran,

More information

Multi-path Routing for Mesh/Torus-Based NoCs

Multi-path Routing for Mesh/Torus-Based NoCs Multi-path Routing for Mesh/Torus-Based NoCs Yaoting Jiao 1, Yulu Yang 1, Ming He 1, Mei Yang 2, and Yingtao Jiang 2 1 College of Information Technology and Science, Nankai University, China 2 Department

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

A DRAM Centric NoC Architecture and Topology Design Approach

A DRAM Centric NoC Architecture and Topology Design Approach 11 IEEE Computer Society Annual Symposium on VLSI A DRAM Centric NoC Architecture and Topology Design Approach Ciprian Seiculescu, Srinivasan Murali, Luca Benini, Giovanni De Micheli LSI, EPFL, Lausanne,

More information

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Dong Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz School of Electronics and Computer Science University of Southampton

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

ISSN Vol.03,Issue.06, August-2015, Pages:

ISSN Vol.03,Issue.06, August-2015, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.03,Issue.06, August-2015, Pages:0920-0924 Performance and Evaluation of Loopback Virtual Channel Router with Heterogeneous Router for On Chip Network M. VINAY KRISHNA

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Small Virtual Channel Routers on FPGAs Through Block RAM Sharing

Small Virtual Channel Routers on FPGAs Through Block RAM Sharing 8 8 2 1 2 Small Virtual Channel rs on FPGAs Through Block RAM Sharing Jimmy Kwa, Tor M. Aamodt ECE Department, University of British Columbia Vancouver, Canada jkwa@ece.ubc.ca aamodt@ece.ubc.ca Abstract

More information

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns

The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns The Effect of Adaptivity on the Performance of the OTIS-Hypercube under Different Traffic Patterns H. H. Najaf-abadi 1, H. Sarbazi-Azad 2,1 1 School of Computer Science, IPM, Tehran, Iran. 2 Computer Engineering

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

WITH THE CONTINUED advance of Moore s law, ever

WITH THE CONTINUED advance of Moore s law, ever IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 11, NOVEMBER 2011 1663 Asynchronous Bypass Channels for Multi-Synchronous NoCs: A Router Microarchitecture, Topology,

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 180 186 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) A Perspective on

More information

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH

Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip. Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Overlaid Mesh Topology Design and Deadlock Free Routing in Wireless Network-on-Chip Danella Zhao and Ruizhe Wu Presented by Zhonghai Lu, KTH Outline Introduction Overview of WiNoC system architecture Overlaid

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

The Benefits of Using Clock Gating in the Design of Networks-on-Chip

The Benefits of Using Clock Gating in the Design of Networks-on-Chip The Benefits of Using Clock Gating in the Design of Networks-on-Chip Michele Petracca, Luca P. Carloni Dept. of Computer Science, Columbia University, New York, NY 127 Abstract Networks-on-chip (NoC) are

More information

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee

Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Sanaz Azampanah Ahmad Khademzadeh Nader Bagherzadeh Majid Janidarmian Reza Shojaee Application-Specific Routing Algorithm Selection Function Look-Ahead Traffic-aware Execution (LATEX) Algorithm Experimental

More information

Transaction Level Model Simulator for NoC-based MPSoC Platform

Transaction Level Model Simulator for NoC-based MPSoC Platform Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits & Systems, Hangzhou, China, April 15-17, 27 17 Transaction Level Model Simulator for NoC-based MPSoC Platform

More information

Traffic Generation and Performance Evaluation for Mesh-based NoCs

Traffic Generation and Performance Evaluation for Mesh-based NoCs Traffic Generation and Performance Evaluation for Mesh-based NoCs Leonel Tedesco ltedesco@inf.pucrs.br Aline Mello alinev@inf.pucrs.br Diego Garibotti dgaribotti@inf.pucrs.br Ney Calazans calazans@inf.pucrs.br

More information

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied

More information

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network

Global Adaptive Routing Algorithm Without Additional Congestion Propagation Network 1 Global Adaptive Routing Algorithm Without Additional Congestion ropagation Network Shaoli Liu, Yunji Chen, Tianshi Chen, Ling Li, Chao Lu Institute of Computing Technology, Chinese Academy of Sciences

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* Eui Bong Jung 1, Han Wook Cho 1, Neungsoo Park 2, and Yong Ho Song 1 1 College of Information and Communications, Hanyang University,

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Power and Area Efficient NOC Router Through Utilization of Idle Buffers Mr. Kamalkumar S. Kashyap 1, Prof. Bharati B. Sayankar 2, Dr. Pankaj Agrawal 3 1 Department of Electronics Engineering, GRRCE Nagpur

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

An Energy-Efficient Reconfigurable NoC Architecture with RF-Interconnects

An Energy-Efficient Reconfigurable NoC Architecture with RF-Interconnects 2013 16th Euromicro Conference on Digital System Design An Energy-Efficient Reconfigurable NoC Architecture with RF-Interconnects Majed ValadBeigi, Farshad Safaei and Bahareh Pourshirazi Faculty of ECE,

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip

On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip Ling Wang, Zhihai Guo, Peng Lv Dept. of Computer Science and Technology Harbin Institute of Technology Harbin, China

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors

On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors On Topology and Bisection Bandwidth of Hierarchical-ring Networks for Shared-memory Multiprocessors Govindan Ravindran Newbridge Networks Corporation Kanata, ON K2K 2E6, Canada gravindr@newbridge.com Michael

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Modeling the Traffic Effect for the Application Cores Mapping Problem onto NoCs

Modeling the Traffic Effect for the Application Cores Mapping Problem onto NoCs Modeling the Traffic Effect for the Application Cores Mapping Problem onto NoCs César A. M. Marcon 1, José C. S. Palma 2, Ney L. V. Calazans 1, Fernando G. Moraes 1, Altamiro A. Susin 2, Ricardo A. L.

More information

Energy Efficient and Congestion-Aware Router Design for Future NoCs

Energy Efficient and Congestion-Aware Router Design for Future NoCs 216 29th International Conference on VLSI Design and 216 15th International Conference on Embedded Systems Energy Efficient and Congestion-Aware Router Design for Future NoCs Wazir Singh and Sujay Deb

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information