High Performance Interconnect and NoC Router Design

Similar documents
Efficient And Advance Routing Logic For Network On Chip

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

Dynamic Router Design For Reliable Communication In Noc

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

A Modified NoC Router Architecture with Fixed Priority Arbiter

Implementation of PNoC and Fault Detection on FPGA

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

Design and implementation of deadlock free NoC Router Architecture

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

A NEW DEADLOCK-FREE FAULT-TOLERANT ROUTING ALGORITHM FOR NOC INTERCONNECTIONS

Evaluation of NOC Using Tightly Coupled Router Architecture

ISSN Vol.04,Issue.01, January-2016, Pages:

Index Terms FIFO buffers, in-field test, NOC, permanent fault, transparent test. On Line Faults in FIFO Buffers of NOC Routers 1.

Smart Reliable Network on Chip and its Area reduction using Elastic buffer

FT-Z-OE: A Fault Tolerant and Low Overhead Routing Algorithm on TSV-based 3D Network on Chip Links

Deadlock-free XY-YX router for on-chip interconnection network

Design of Efficient Power Reconfigurable Router for Network on Chip (NoC)

SERVICE ORIENTED REAL-TIME BUFFER MANAGEMENT FOR QOS ON ADAPTIVE ROUTERS

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

A Flexible Design of Network on Chip Router based on Handshaking Communication Mechanism

On an Overlaid Hybrid Wire/Wireless Interconnection Architecture for Network-on-Chip

Fault Tolerant Prevention in FIFO Buffer of NOC Router

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

A Heuristic Search Algorithm for Re-routing of On-Chip Networks in The Presence of Faulty Links and Switches

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

FAULT TOLERANT FIFO BUFFER OF NOC ROUTER

Detecting Temporary and Permanent Faults in NOC Using FTDR Algorithm

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Demand Based Routing in Network-on-Chip(NoC)

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Literature Review of on-chip Network Design using an Agent-based Management Method

WITH the development of the semiconductor technology,

Analyzing the Performance of NoC Using Hierarchical Routing Methodology

Deadlock and Livelock. Maurizio Palesi

NOC Deadlock and Livelock

Fault-adaptive routing

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

A Novel Energy Efficient Source Routing for Mesh NoCs

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

Lecture 18: Communication Models and Architectures: Interconnection Networks

VLSI Architecture to Detect/Correct Errors in Motion Estimation Using Biresidue Codes

Performance Oriented Docket-NoC (Dt-NoC) Scheme for Fast Communication in NoC

Design of Low-Power and Low-Latency 256-Radix Crossbar Switch Using Hyper-X Network Topology

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

Designing and Implementation of a Network on Chip Router Based on Handshaking Communication Mechanism

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Design of Router Architecture Based on Wormhole Switching Mode for NoC

Outline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

on Chip Architectures for Multi Core Systems

the main limitations of the work is that wiring increases with 1. INTRODUCTION

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

Implementation of Full -Parallelism AES Encryption and Decryption

High Throughput and Low Power NoC

ISSN Vol.03, Issue.02, March-2015, Pages:

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

ISSN Vol.08,Issue.07, July-2016, Pages:

Low Power GALS Interface Implementation with Stretchable Clocking Scheme

Design and Analysis of On-Chip Router for Network On Chip

The design of a simple asynchronous processor

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

In-Field Test for Permanent Faults in FIFO Buffers of NoC Routers

PERFORMANCE EVALUATION OF FAULT TOLERANT METHODOLOGIES FOR NETWORK ON CHIP ARCHITECTURE

A Proposed RAISIN for BISR for RAM s with 2D Redundancy

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

TESTING OF FAULTS IN VLSI CIRCUITS USING ONLINE BIST TECHNIQUE BASED ON WINDOW OF VECTORS

IMPLEMENTATION OF LOW POWER DATA ENCODING TECHNIQUES FOR NoC

Configurable Error Control Scheme for NoC Signal Integrity*

A Fault Tolerant NoC Architecture for Reliability Improvement and Latency Reduction

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN

Fault-Tolerant Routing Algorithm in Meshes with Solid Faults

Networks-on-Chip Router: Configuration and Implementation

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

Quasi Delay-Insensitive High Speed Two-Phase Protocol Asynchronous Wrapper for Network on Chips

A New CDMA Encoding/Decoding Method for on- Chip Communication Network

Transaction Level Model Simulator for NoC-based MPSoC Platform

Multi-path Routing for Mesh/Torus-Based NoCs

Content Addressable Memory with Efficient Power Consumption and Throughput

Index 283. F Fault model, 121 FDMA. See Frequency-division multipleaccess

Parallelized Network-on-Chip-Reused Test Access Mechanism for Multiple Identical Cores

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Performance Analysis of Routing Algorithms

Hermes-GLP: A GALS Network on Chip Router with Power Control Techniques

Transcription:

High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali S Assistant Professor, Dept. of ECE K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 sdp.krct@gmail.com Abstract The increasing number of transistors on a single chip enables complexity of system using the SoC approach. Only solution for this is on-chip communication which makes communication efficiently on a single chip. This paper gives the design of on-chip router and communication link. The proposed router is based on weighted routing algorithm which makes packet routing efficiently. An error detection mechanism is proposed for handling faulty parts of NoC. A low energy semiserial on-chip communication link is also designed. For long range communication semi serial on-chip link between the routers are proposed. Proposed architecture of on-chip router and semi-serial link give the results in which power consumption is reduced and hardware utilizations such as number of slices, LUT need is also reduced and latency is also minimized. Index Terms Network on Chip (NoC), System on Chip (SoC), Weighted Routing Algorithm, Dual Rail Encoder, Pulse Signaling, Data Validity Detector. I. INTRODUCTION The semiconductor technology scaling creates the potential of system-on-chip (SoC) integration, that is, a complete electronic system is integrated on a single die. Shared bus interconnection between SoC components becomes spaghetti if billion, trillion transistors are increased on a single IC. A simple NoC is a good solution for the efficient implementation of SoC. NoC is a communication infrastructure in which each component of a SoC is viewed as a node. NoC use packets to route data from source to destination component, via switches (routers) and point -to-point links. 1.1 On-Chip Router A router is the most important component, it act as the heart of a NoC. For NoC system router provides communication backbone. It directs the packet from source to destination. So it should be designed for maximum efficiency and throughput. Routers are intelligent devices that receive incoming data packets, inspect their destination. To figure out the best path for the data to move from source to destination, routing algorithm is used. In this paper weighted xy algorithm is proposed. 1.2 Global On-Chip Link Links are used to transmit packet between routers. It physically connects the nodes and enables the communication in the network. NoC links become a primary challenge for high performance high complexity SoCs, because transmitting clock, data and communication signals over large die area requires long interconnection. Technology scaling leads to decrease long interconnection links which in turn increase power consumption and high latency. Globally Asynchronous Locally Synchronous scheme has been proposed as a solution for low latency and low power links between cores. GALS facilitate the integration of independently designed blocks operating at different frequencies as well as allow better energy saving because each functional unit has its own independent clock. The remaining of this paper is organized as follows. The proposed NoC router is presented in section II. The proposed semi-serial links are presented in section III. Simulation results and analysis of proposed work are presented in section IV. Finally the conclusion drawn from this work is discussed in section V. II. THE PROPOSED NOC ROUTER ARCHITECTURE A NoC router is composed of a number of input ports, a number of output ports and a local port to access the IP core (i.e) usually five ports is used which is shown in Fig. 1. Out of five ports four ports are in cardinal direction (North, South, East, and West) and one port is attached to its local processing element (PE). 603

Fig. 1. General Router Structure A reliable router architecture to overcome both faults during runtime and latency is proposed in this paper. The proposed architecture has eight ports (North, South, East, West, North West, North East, South West, and South East) which are shown in Fig. 2. Fig. 3. Block Diagram of Router Port A. Error Correction and Detection Blocks Error Correction techniques are used to ensure error free transmission of messages. The degree of reliability can be increased if the data s are transmitted without any error. Error correction codes are the most common approach to maintain a good level of reliability. Here routing error detection is used to find faults during runtime. B. FIFO Buffer Both input and output port has its own buffer, which is used to store the data temporarily. After error correction block the data are stored in FIFO buffer until the current data is directed to its corresponding destination port. This will avoid collision between the current and upcoming data s. Fig. 2. 8-port Router And each port consists of Error Correction (ECC) blocks, Routing Error Detection (RED) blocks, FIFO buffer and Routing Logic (RL) blocks, Control logic, Finite State Machine (FSM) as shown in Fig. 3. C. Routing Logic Blocks Routing algorithm determine how data are forwarded. Routing algorithm varies depending on how the path is chosen to transmit packets. In xy routing which is the most popular routing algorithm, the header flits have destination node address. The current router compares its address with the destination address. Based on the comparison packets are directed to its destination node. But it is totally depends upon the network load condition. In deterministic routing, between a pair of node predetermined paths only chosen. It can t tolerate online faults. To overcome this weighted-xy (w-xy algorithm) routing is proposed. It assigns each output port a weight based on available bandwidth, current and destination (row and Colum) address. That is depending on certain conditions, directions are selected. The constraints and direction selection for eight ports is given in Table I. 604

TABLE I 8 Port Constraints and Direction Selection D. Loopback Module The control logic sense the availability of neighboring router. From control logic, finite state machine (FSM) gets information about neighbor router. If the neighboring router is not available then the outgoing data packet is loopback to the input buffer, else the data packet is send to other router. Thus latency is reduced. Fig. 4. Block Diagram of Serial On-Chip Link 605

III. SEMI SERIAL ON-CHIP COMMUNICATION LINK Scaling down of transistor s dimensions leads to improvements in both transistor cost and performance. However, scaling down of interconnect cross sectional dimensions degrades performance. To overcome this problem, self-timed delay insensitive links are needed which can operate correctly in the presence of delay variations in gate and interconnecting wires. High speed and power efficient on-chip interconnect has been proposed which is shown in Fig.4. The presented semi serial on-chip link is implemented based on the design of high speed serializer/deserializer circuits. Pulse dual-rail encoding, driver and data validity decoder. A. Communication Protocol Semi serial link presented in this paper is based on Globally Asynchronous Locally Synchronous (GALS) communication. It helps independently designed blocks to operate at different frequencies. Thus the handshaking protocol used in this paper is two-phase bundled data. Each sender and receiver module have its own two-phase bundled data interface. If a request signal (Req2L) is send to serializer the data get ready and become stable, then the data is loaded into the shift register. Along with data, stop bit is also loaded which is used to stop shifting in the deserializer. Thus additional control logic is not needed. deserializer. In addition to locally generated clock, stop bit sent to stop shifting if all the data is loaded into deserializer. E. Deserializer and Receiving Module If the stop bit is arrive at the last flip-flop which indicate shifting is completed and the data are read out parallel. After this process request to sender is sent. When acknowledgment is received from receiving module the shift register present in deserializer is cleared. F. Receiver and Driver The complementary output is given to acknowledgement receiver which sent out (Ack2L) to the sending module. IV. SIMULATION RESULT The proposed on-chip router and semi serial link is implemented by using the Verilog coding and the simulation waveforms have been obtained from the Modelsim. B. Serializer In serializer the parallel loaded data s are converted into serial. Along with the output of serializer, a locally generated clock is also send. If the parallel data loading is completed this local clock starts to run. This clock is used for data shifting and dual-rail encoding. And this clock is stoppable which runs only when data in the shift register to be transmitted and stopped at all other time, thus communication power is saved. C. Dual-rail Encoder and Differential Driver For data transmission through the wire dual-rail and differential pulse mode is used. In classical encoding each wire carries a single bit value but dual-rail encoding use two wires to carry each bit. That is for transmitting N- bits data encoding technique need 2N wires. Differential driver is used because of its high performance, noise immunity and better energy efficiency. The driver have two wires, one is for data and another wire have its complementary data. Thus data decoding technic is not needed. Fig. 5. Simulation Waveform Obtained from Modelsim. D. Data Validity Decoder The output of encoder and driver is shifted into the deserializer through the data validity decoder. It is nothing but generate clock which is need for shifting data into 606

Fig. 7. Latency Report Fig. 8. Graphical Comparison TABLE II Performance Analysis Hardware Utilizations Estimated Results Base paper Results Slice 240 583 LUTs 429 691 Latency 4.114ns 20.86ns Fig. 6. Hardware Consumption Report. V. CONCLUSION The proposed on-chip router in this paper includes weighted routing algorithm and error detection, correction mechanism is presented. To achieve high-throughput and to dissipate low energy per bit semi-serial on-chip link is presented. Therefore, presented on-chip router and link provide high throughput, power efficient and also minimize latency. 607

REFERENCES [1] G.-M. Chiu, The odd-even turn model for adaptive routing, IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 7, pp. 729 738, Jul. 2000. [2] C. Grecu, A. Ivanov, R. Saleh, E. Sogomonyan, and P. Pande, On-line fault detection and location for NoC interconnects, in Proc. 12th IEEE Int. On-Line Test. Symp., Jul. 2006, pp. 145 150. [3] M. Hosseinabady, M. Kakoee, J. Mathew, and D. Pradhan, Low latency and energy efficient scalable architecture for massive NoCs using generalized de Bruijn graph, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 8, pp. 1469 1480, Aug. 2011. [4] S. Jovanovic, C. Tanougast, S. Weber, and C. Bobda, A new deadlock-free fault-tolerant routing algorithm for NoC interconnections, in Proc. Int. Conf. Field Program. Logic Appl., Aug. Sep. 2009, pp. 326 331. [5] D. Park, C. Nicopoulos, J. Kim, N. Vijaykrishnan, and C. Das, Exploring fault-tolerant network-on-chip architectures, in Proc. Int. Conf. Depend. Syst. Netw., Jun. 2006, pp. 93 104. [6] K. Sekar, K. Lahiri, A. Raghunathan, and S. Dey, Dynamically configurable bus topologies for highperformance on-chip communication, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 10, pp. 1413 1426, Oct. 2008. [7] Cédric Killian, Camel Tanougast, Fabrice Monteiro, and Abbas Dandache,2013 Smart Reliable Network-on-Chip IEEE Transactions On Very Large Scale Integration (VlSI) Systems, vol. 22, no. 2, feb. 2014. [8] J. Wu, A fault-tolerant and deadlock-free routing protocol in 2d meshes based on odd-even turn model, IEEE Trans. Comput., vol. 52, no. 9, pp. 1154 1169, Sep. 2003. [9] A. J. Martin and M. Nystrom, Asynchronous Techniques for System-on-Chip Design, proceeding of the IEEE, vol. 94, no. 6, pp. 1089-1120, 2006. [10] W. J. Bainbridge and S. B. Furber, Delay insensitive system-on-chip interconnect using 1-of-4 data encoding, International symposium on asynchronous circuits and systems, pp. 118-126, Mar. 2001. 608