Hardware Design, Synthesis, and Verification of a Multicore Communications API

Size: px
Start display at page:

Download "Hardware Design, Synthesis, and Verification of a Multicore Communications API"

Transcription

1 Hardware Design, Synthesis, and Verification of a Multicore Communications API Benjamin Meakin Ganesh Gopalakrishnan University of Utah School of Computing {meakin, ganesh}@cs.utah.edu Abstract Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent hardware and software architectures are inevitable in future systems. One of the greatest problems in these systems is communication. Providing coherence, consistency, synchronization, and sharing of data in a multicore system requires that communication overhead is minimal. It is essential that scalable, flexible, and efficient hardware/software mechanisms be researched and developed to ease the technical community into developing concurrent systems. This research effort is to create such mechanisms by designing a scalable hardware implementation of a multicore communication API. This API, developed by the Multicore Association, targets embedded devices and aims to provide communication primitives for embedded systems on chips. It is a lightweight message passing interface that offers the potential for greater communication performance and lower power than other solutions at the expense of broad functionality. Realizing this potential is almost entirely up to the implementation. This paper describes the design, synthesis, and verification of such an implementation. The low latency, low power, and high throughput aspirations of the API are the very performance metrics this implementation seeks to optimize. This is achieved by designing an on-chip network as the physical communication medium and modifying a MIPS processor to implement an extension to the instruction set. This will permit implementation of the API with direct hardware support. The result of this effort is a useful case study of a hardware design, synthesis, and verification flow of a possible implementation of an emerging multicore communication API. It will show that such an organization yields comparable performance results to current communication architectures with greater scalability and potential for future innovations. 1. Introduction It is widely accepted that modern and future computing systems will see performance improvements primarily through exploiting increased process/thread level parallelism. One of the main prerequisites to exploiting this parallelism efficiently is the availability of APIs that are well matched with the communication / synchronization needs of this area. Clearly, one size does not fit all. In the area of largescale cluster computing, the Message Passing Interface (MPI) a very sophisticated API with over 300 functions -- is the lingua franca. MPI is used to program cluster computers with up to hundreds of thousands of processing nodes. In other realms such as embedded systems using commodity microprocessors, various real-time operating system primitives and shared memory threads serve the needs of communication and synchronization. However for the rapidly exploding area of embedded systems based on multiple cores, chips not only contain multiple general purpose computing cores, but for cost effectiveness and performance also contain application specific accelerators, I/O interfaces, and memory controllers. All of these on-chip devices need low overhead communication. As semiconductors continue to scale, more and more of these devices will be found on the same chip. Instead of reinventing the wheel, it is imperative that semiconductor companies agree on a standard software API that, on one hand, offers high efficiency, but on the other hand offers the ability to build and re-use applications software. The need for such a standardized API is underscored by the emergence of on-chip networks as the physical communication mechanism (as opposed to busses). Thus the standard API must be able to mimic the functionality offered by MPI and threads but in a

2 much light-weight manner, and in a manner that meshes well with the existing (bus-based) and emerging (network based) hardware transport mechanisms. This paper describes an effort to merge these two trends sophisticated transport mechanisms in hardware, driven by reusable standard API based high performance software. This effort is part of the emerging multicore communication API (MCAPI) led by over two dozen MCA member companies. There are also a few University members in the MCA including our very research group! Our primary goal in joining the MCA (by paying an annual membership fee) has been to observe the creation of MCAPI at early stages, understand the motivations behind it, and to develop meaningful formal methods solutions for this area. This paper describes our efforts to understand the hardware design of MCAPI thoroughly (a companion paper describes our efforts to thoroughly understand the software formal verification needs of this area). Our approach was similar to the plea get real; get physical i.e. one could endlessly tout the virtues of MCAPI, but unless one has an actual implementation of MCAPI in silicon, one cannot assess its true merits. This paper represents the following ambitious journey embarked on by one graduate student and his advisor: design a nine MIPS-core based MCAPI fabric on FPGA; modify the MIPS core ISA to support MCAPI primitives efficiently; demo this subsystem and release it in the public domain for everyone in the community to play with an actual MCAPI in silicon. Thereafter, work on several projects as follow-ons: (i) write real MCAPI applications in C, compile it and run it on our target, (ii) write a BlueSpec specification for our architecture and re-derive our hardware, and (iii) apply IBM s SixthSense tools to formally verify our hand-designed MCAPI in silicon. (Note: Thanks to IBM, we are the only university to have a license for IBM s SixthSense tools, so far as we know. It should be possible for other universities to benefit from the results IBM observes happening vis-à-vis SixthSense in our group. Also our work is being presented in the Multi-core Expo in Spring 2009 as a poster.) MCAPI is a lightweight message passing specification targeted towards embedded SoC s. It has been shown in [2, 3 and 8] among many other publications that on-chip networks are a necessary direction in parallel architectures to circumvent the issues associated with rapidly increasing wire delays and increasing latency due to contention for shared buses. Any future concurrent system should have some sort of scalable interconnection network. This is the main hardware design focus of this work. A high throughput, low-latency on-chip network has been designed with MCAPI and embedded applications in mind. This design is described in section 3 and the communication performance that it achieves in section 5. The phases of our long-term work are the following. First, we offer a detailed assessment of MCAPI by providing the first public domain design (in FPGA) of an MCAPI based communication architecture. This design consists of nine modified MIPS cores connected through a worm-hole routed NoC fabric. The second contribution will be to formally verify parts of our implementation using the IBM SixthSense tools. The third contribution will be to re-derive parts of our design using the BlueSpec language and compilation system. Because of the scale of these steps, what we have concretely achieved consists of the first of these three phases. 2. Multicore API implementation 2.1. MCAPI Overview The Multicore Communication API is a message passing interface that is similar to MPI. However, it is designed primarily for embedded devices where broad functionality is not as important as high performance in a few types of communication. MCAPI provides the communication primitives that can be used by operating systems, libraries, and applications to improve code portability across different hardware generations. Since the API is designed for on-chip communication, it makes few assumptions about the hardware architecture and leaves a lot of freedom for the implementation to take advantage of whatever optimizations may be permitted through the architecture. For example, if there is shared memory available, transmission of data can be implemented as pointer passing. This eliminates unnecessary copies that may be required by other solutions. There are two types of communication defined in MCAPI. The first is connected packet and scalar channels. These channels require that the user define two endpoints and a communication link between them. Data can then be sent on this connection with very high throughput. The second type of communication is connectionless messages. These messages do not require a connection between two

3 endpoints to be established; they can be sent to any other endpoint. However, they require greater overhead to transmit, so throughput is not as good. While the aspirations of the API sound impressive, they are largely dependent on the implementation to realize the potential MIPS ISA Extension The core functionality of the implementation described in this paper is controllable via a set of RISC type assembly instructions added as an extension to the MIPS instruction set. These instructions are given in figure 1. The decision to make the control of the communication hardware programmable is based on an effort to avoid over complicating the hardware Example Send/Receive With these added instructions MCAPI can be implemented as a C library with in-line assembly code. Example implementations of MCAPI message send and receive functions are given in figures 2 and 3 respectively. For brevity, some error checking code has been omitted. Note that the code size of these functions is relatively small. This is critical to minimizing the memory footprint of the library implementation such that it is suitable for embedded applications. Fig. 1 Using these instructions all of the communication functionality of MCAPI can be implemented. The send header instruction builds the packet header and sends it on the network. It includes source and destination node identifiers, as well as a packet class that indicates the type of data being sent (i.e. pointer/buffer, short, integer, or long). The receive header instruction subsequently gets the packet header and writes it to a register. The header data can then be parsed using bit mask and shift instructions (standard with the MIPS ISA). The send/receive word instructions send or receive word length chunks of data. The get ID and flag instructions write the local node ID and the specified network flag to registers respectively. Available network flags are described in detail in the next section, but they include the necessary information for determining when a packet is available, when the network is busy, and some simple error checking. Fig. 2 Both send and receive functions can be expected to return very quickly. Since the underlying hardware supports zero copy data transfers through pointer passing, the send function is very fast. The receive function contains loops to check for data being available. These are mostly to ensure correctness since it is expected that a user would call mcapi_msg_available before calling the receive function.

4 This reasoning does not even consider the second advantage of grid networks which is short wire lengths. Increasing wire delays are becoming a major problem in designing chips in modern process technologies [4]. Grid networks partition communication paths such that adding more cores to a design will not increase the length of the wires. Example NoC Layout Fig On-Chip Network Design 3.1. Network Topology As the physical communication medium, a two dimensional grid network was designed to meet the data transport needs of MCAPI. Grid network topologies have several properties which make them attractive for this type of application. First, they are highly scalable. For an N x N grid network, the number of cores is N^2. The worst case communication latency in network hops is linear, following a curve of about 2N. Compare this to a bus where the worst case latency for communication is equal to the number of cores, where there is a fair arbitration scheme. So with respect to the number of cores, grid networks have sub-linear scalability for performance. Fig. 4 Figure 4 shows an example network layout similar to what has been designed here. The only differences being that this diagram shows accelerators, I/O interfaces, and L2 caches illustrating the heterogeneous nature of systems that may use this type of MCAPI implementation. Note that the tiled layout of this network physically distributes the L2 cache, but logically the L2 cache is shared. This architecture has been proposed in [5] Wormhole Router with Virtual Channels The key hardware component in a scalable interconnect is an on-chip router. There are many different types of routers, but wormhole routers utilizing virtual channels have been shown in [6] to be efficient designs for on-chip networks. In a wormhole router, packets are divided into flits, and flits are passed through a network in a pipelined fashion. This means that routers only need buffer space for a small portion of a packet. This saves

5 power and chip area, which are critical design metrics in an embedded chip. MCAPI packets are divided into flits as shown in figure 5. The head flit consists of a destination node and port ID, packet class, and sender ID. In general each core is a node. The port ID is included for future extensions which may support multiple endpoints per node. The sender ID is necessary so that the destination can determine where the packet came from. Different packet classes are used to implement the functionality of MCAPI. Since data can be sent as scalar values of various bit widths and as pointers, these packet classes tell the user how to interpret the received data. It will be shown through the description of the implementation that opening a channel,thus reserving network resources, is as simple as sending a header flit. Therefore, packet and scalar channels are implemented with the same instructions used to send connectionless messages. The only difference is that channel communication can consist of an arbitrary number of packets. It will be shown that this can negatively affect the overall network performance. Fig. 5 The on-chip router that has been designed for this implementation resembles the block diagram in figure 6. The key features include: 5 input and output channels each with two virtual channels, a single-cycle 16-bit data path, fair round-robin channel arbitration, and deadlock free dimension-order routing. The flow control of the router is a simple token scheme where each VC has buffer space for two flits. Once a VC has a flit, it sets its token signal high to stall the pipeline. One flits worth of buffer space is insufficient because another flit may have already been sent at the same time the token gets set high. To prevent buffer overflow, two buffer slots are needed in this single cycle design. Router Block Diagram Fig. 6 The key to permitting high throughput in a wormhole routing scheme is virtual channels. Flits of a packet can be strung out all across a network. Since only the header flit contains routing information, the body flits must follow immediately behind the header. This causes other packets contending for the same channel to be stalled. It is for this reason that packet and scalar channels can negatively impact overall network performance, since they have arbitrary lengths of data. Virtual channels allow these packets to continue moving through the network by allocating resources to packets at the buffer level, not the physical channel. Since there are two VC s per physical channel, each node may only open a single packet or scalar channel at a time. This is to help ensure that a VC will usually be available and that there is an upper bound on how long a packet may be stalled at each hop. In this design, the decision of which virtual channel to use for a packet is determined by token inputs and saturating counters that track network traffic on each physical channel. Decisions about which VC to use can only be made for the header flit, subsequent flits must follow the header. Therefore, if two packets are sequentially sent across the same physical link the second packet will choose the opposite VC as the first. This balances network load and improves average case throughput and latency.

6 The arbitration process is fairly simple, and is based on the techniques described in [6]. The router data path computes the routing direction, latches the data, and sends a request to the arbiter for the output channel in the first phase. In the second phase, the arbiter sends a grant signal to the VC and sets the appropriate control signals for the crossbar, which forwards the data to the next router. When two flits want to use the same output channel, a decision is made based on the value of a ring counter. The goal of the arbiter is to forward as many flits as possible every clock cycle. The router module is the main bottleneck in system clock speed. The decision to use a single cycle design for the router is part of an effort to achieve low latency, but it won t be able to be clocked as fast as a pipelined router. However, because MCAPI is targeted towards embedded designs this seemed to be a worthwhile trade off, since embedded chips have lower clock rates. Research shown in [7] demonstrates several techniques for decreasing single cycle clock time in routers by taking the routing function and arbitration process off of the critical path. This further justifies the use of a single cycle router. However, it is very important for overall system clock speed that there are no combinational paths through the router. This is to ensure that there are no signals that propagate from one network node to another without being latched. If this were allowed to happen, then the system clock period would be the time taken to traverse multiple network hops Network Interface Unit The network interface unit (NIU) provides the necessary functionality to easily interface a modified MIPS processor to the grid network. A block diagram is given in figure 7. NIU Block Diagram The NIU consists of send and receive modules, send and receive buffers, and a register containing operation flags. The send module builds flits and sends them on the network according to the bus inputs from the processor and the opcode. It also makes an initial decision about which VC to send the packet on. The receive module observes the front of each VC receive buffer which causes state transitions, which determine whether the available data is a header, body, or tail flit. It then removes buffer items when the appropriate opcode is seen. The flags that are set include send and receive port busy flags, header and data available flags, send and receive error flags, and a network busy flag. These flags provide the necessary information to the user for detecting buffer overflow, network overload, and for determining when a message is available. 4. Synthesis and Verification 4.1. Synthesis for Virtex5 The target platform for this implementation is a Xilinx Virtex 5 FPGA. Programmable logic is used so that the designs here could be used in other research efforts exploring multicore system innovations. The synthesis results in terms of device utilization and performance are given in table 1 for each module and for an entire 9-core system laid out similar to the design in figure 4. Note, the complete system excludes L2 cache but does include 16 B of L1 cache. Future extensions may include shared L2 cache because the total device utilization for the existing 9-core system is only about 50%. Synthesis Results Module LUT's Registers Clk Rate MIPS Core MHz NIU MHz Router MHz 9-Core NoC MHz Table Testing Methodology Fig. 7 Testing is performed by running test programs and observing the output waveforms in a VHDL simulator. An assembler has been created for this system and an example program has been run successfully. However, this test program did not stress the network. It involved a simple producer / consumer communication pattern between two MIPS cores while the remaining cores were idle. It

7 is important that more extensive applications be used to accurately observe the average case communication latency. Formal verification of critical hardware components is also important, but at the time of this writing no significant formal verification has been done on these designs. 5. Communication Performance The best and worst-case communication performance in terms of clock cycles can be summarized by equations 1 and 2. These represent the latency for the header flit. Subsequent flits follow directly behind the header. Best Case: L = H + k (1) Worst Case: L = 5*H*S + k (2) In both equations, 'L' is the latency in cycles, 'H' is the number of hops, 'S' is the average size in flits of other packets, and 'k' is a constant derived from other non-communication instructions in the implementation. From initial test of a lightly loaded network, latency for a single flit traveling across 3 hops is 9 cycles. This represents the best case latency. Due to the design characteristics of the network, the worst-case latency is very unlikely. For the worst case to occur, 5 packets would need to come into each router at the same time, and at each router the flit being sent would have to be last on the arbitration schedule. It has been observed that the average case latency is much closer to the best case latency. For connected packet and scalar channels, the body packets achieve best-case latency. Throughput is also increased because there is much less header/tail overhead and since network resources are reserved when the channel is opened a flit can progress through the connection every clock cycle. At 130 MHz the throughput of a channel connection is about 260 MB/s, because each body flit is 2-bytes. 6. Conclusions and Future Work The work shown here demonstrates the viability and usefulness of hardware support for inter-core communication. Even in an FPGA implementation, low communication latency and high throughput (260 MB/s) is possible. Current research in on-chip network design provides solutions for minimizing cost and power while improving performance. These innovations will continue and further justify the implementation of communication API's in hardware. What has been presented here is an efficient and scalable concurrent computing platform with greater potential for future innovations than existing solutions. Several immediate directions for future work related to this project include: the evaluation of IBM's SixthSense VHDL model checker and Bluespec's automated design tools using hardware designs presented in this paper; and the development of automated network synthesis and optimization algorithms for MCAPI workloads. Evaluation of SixthSense has already begun. A tutorial for using the tool has been created and is available in [11]. The key units in this design that need formal verification are the arbitration and VC allocation units of the router module. This is because it is difficult to create workloads that will stress the network. For correctness, it must be verified that a packet will never be sent on a VC being used by another packet until that packet releases the resource. This is a perfect case study for hardware verification because it tests a situation that traditional test vectors would be unlikely to catch. 7. Acknowledgments This work has been funded by SRC task ID References [1] Multicore Association Communication API Specification V1.063, [2] M. Ali, M. Welzl, M. Zwicknagl, Networks on Chips: Scalable Interconnects for Future Systems on Chips, IEEE [3] R. Das, S. Eachempati, A. K. Mishra, V. Narayanan, C. R. Das, Design and Evaluation of a Hierarchical On- Chip Interconnect for Next-Generation CMPs, HPCA [4] R. Ho, K. Mai, M. Horowitz, The Future of Wires, Proceedings of the IEEE, April 2001, pp [5] N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, R-NUCA: Data Placement in Distributed Shared Caches, ISCA [6] E. Shin, V. Mooney, G. Riley, Round-Robin Arbiter Design and Generation, ISSS 2002.

8 [7] Mullins, West, Moore, Low-Latency Virtual-Channel Routers for On-Chip Networks, ISCA [8] W. Dally, B. Towels, Route, Packets, Not Wires: On- Chip Interconnection Networks, Proceedings of IEEE Design Automation Conference, 2001, pp [9] V. Dvorak, Communication Performance of Mesh and Ring Based NoC's, 7th International Conference on Networking, 2008, pp [10] I. Nousias, T. Arslan, Wormhole Routing with Virtual Channels using Adaptive Rate Control for Network-on-Chip (NoC), Proceedings of 1st NASA/ESA Conference on Adaptive Hardware and Systems, 2006, pp [11] B. Meakin, SixthSense Tutorial, p/sixthsense

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

MULTICORE SYSTEM DESIGN WITH XUM: THE EXTENSIBLE UTAH MULTICORE PROJECT

MULTICORE SYSTEM DESIGN WITH XUM: THE EXTENSIBLE UTAH MULTICORE PROJECT MULTICORE SYSTEM DESIGN WITH XUM: THE EXTENSIBLE UTAH MULTICORE PROJECT by Benjamin Meakin A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

XUM Documentation: MIPS Instruction Set Extension

XUM Documentation: MIPS Instruction Set Extension XUM Documentation: MIPS Instruction Set Extension Part of XUM version 1.0 Preliminaries: This document standardizes the MIPS instruction set extension implemented in XUM. This instruction set extension

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward

More information

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G MAHESH BABU, et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G.Mahesh Babu 1*, Prof. Ch.Srinivasa Kumar 2* 1. II. M.Tech (VLSI), Dept of ECE,

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model

NoC Simulation in Heterogeneous Architectures for PGAS Programming Model NoC Simulation in Heterogeneous Architectures for PGAS Programming Model Sascha Roloff, Andreas Weichslgartner, Frank Hannig, Jürgen Teich University of Erlangen-Nuremberg, Germany Jan Heißwolf Karlsruhe

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* Eui Bong Jung 1, Han Wook Cho 1, Neungsoo Park 2, and Yong Ho Song 1 1 College of Information and Communications, Hanyang University,

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

On RTL to TLM Abstraction to Benefit Simulation Performance and Modeling Productivity in NoC Design Exploration

On RTL to TLM Abstraction to Benefit Simulation Performance and Modeling Productivity in NoC Design Exploration On to TLM Abstraction to Benefit Simulation Performance and Modeling Productivity in NoC Design Exploration Sven Alexander Horsinka, Rolf Meyer, Jan Wagner, Rainer Buchty and Mladen Berekovic TU Braunschweig,

More information

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705

More information

Fast Flexible FPGA-Tuned Networks-on-Chip

Fast Flexible FPGA-Tuned Networks-on-Chip This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Design and Simulation of Router Using WWF Arbiter and Crossbar

Design and Simulation of Router Using WWF Arbiter and Crossbar Design and Simulation of Router Using WWF Arbiter and Crossbar M.Saravana Kumar, K.Rajasekar Electronics and Communication Engineering PSG College of Technology, Coimbatore, India Abstract - Packet scheduling

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

A unified multicore programming model

A unified multicore programming model A unified multicore programming model Simplifying multicore migration By Sven Brehmer Abstract There are a number of different multicore architectures and programming models available, making it challenging

More information

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011 CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Quality-of-Service for a High-Radix Switch

Quality-of-Service for a High-Radix Switch Quality-of-Service for a High-Radix Switch Nilmini Abeyratne, Supreet Jeloka, Yiping Kang, David Blaauw, Ronald G. Dreslinski, Reetuparna Das, and Trevor Mudge University of Michigan 51 st DAC 06/05/2014

More information

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture On-chip Networking Prof. Michel A. Kinsy Virtual Channel Router VC 0 Routing Computation Virtual Channel Allocator Switch Allocator Input Ports VC x VC 0 VC x It s a system

More information

CS/COE1541: Intro. to Computer Architecture

CS/COE1541: Intro. to Computer Architecture CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Interconnection Networks

Interconnection Networks Lecture 15: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Credit: some slides created by Michael Papamichael, others based on slides from Onur Mutlu

More information

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors

Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Meet in the Middle: Leveraging Optical Interconnection Opportunities in Chip Multi Processors Sandro Bartolini* Department of Information Engineering, University of Siena, Italy bartolini@dii.unisi.it

More information

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES

A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES A NEW ROUTER ARCHITECTURE FOR DIFFERENT NETWORK- ON-CHIP TOPOLOGIES 1 Jaya R. Surywanshi, 2 Dr. Dinesh V. Padole 1,2 Department of Electronics Engineering, G. H. Raisoni College of Engineering, Nagpur

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Brief Background in Fiber Optics

Brief Background in Fiber Optics The Future of Photonics in Upcoming Processors ECE 4750 Fall 08 Brief Background in Fiber Optics Light can travel down an optical fiber if it is completely confined Determined by Snells Law Various modes

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC BWCCA 2010 Fukuoka, Japan November 4-6 2010 Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information