Comparing Interconnection Models in an On-Chip Reconfigurable Multiprocessor

Size: px
Start display at page:

Download "Comparing Interconnection Models in an On-Chip Reconfigurable Multiprocessor"

Transcription

1 Comparing Interconnection Models in an On-Chip Reconfigurable Multiprocessor Rodrigo Soares, Sérgio Queiroz de Medeiros, Ivan Saraiva Silva, David Déharbe Universidade Federal do Rio Grande do Norte Departamento de Informática e Matemática Aplicada [rodrigo, sergio]@consiste.dimap.ufrn.br, [ivan, david]@dimap.ufrn.br Abstract The increasing complexity of present SoCs demands new, scalable, reusable, parallel interconnection models for their cores. This paper presents a comparison study made in an on chip reconfigurable multiprocessor, the X4CP32, on its interconnection. Three models were proposed, a bus system, a SoC using FIFO buffering, and a SoC using SAFC buffering. All the models were described in SystemC and simulated. Results show a great difference between the NoCs and the bus system s performance. 1. Introduction The Reconfigurable Architectures (RAs) have grown as subject of research and importance in the past few years. Currently, most microelectronics conferences have a special session for RAs, which is an evidence of their current popularity. The RAs are composed of two major elements: reconfigurable units and interconnection. Many studies have focused the RAs hierarchy and many aspects of the reconfigurable units, but little is done to improve the other element of RAs, and a vital one, the interconnection. Even the best RA model will not have a satisfactory performance unless it solves the interconnection bottleneck. Some RAs interconnection are composed of a single bus interconnecting all reconfigurable units. That is a simple and economic solution, but not a good one at all. A single bus, despite its low silicon cost, only allows one communication at a time, it isn t scalable, which heavily limits the reconfigurable units number, has a limited bandwidth, which is shared among all the RAs, setting a maximum limit for the RA s performance, and depends on sequential communication. For overcoming some of the problems that the bus presents, many RAs use hierarchic buses architecture, such as ARM AMBA [1], but the bandwidth is still limited. As an alternative to the traditional bus system, we propose the use of a Network-on-Chip (NoC) to interconnect a RA. The NoC has many known advantages, such as scalability, reusability, parallelism in communication, asynchronous communication, pipelined communication channels and it is very customizable. Most, if not all, of these advantages are desirable in an interconnection system, especially in a RA. The major problem with NoC is still the high area cost, which can t be afforded in some cases. This paper is divided as follows: Section 2 presents the State-of-the-Art of interconnection; Section 3 presents the X4CP32 architecture; in Section 4 we have a discussion on different interconnection schemes for the X4CP32; Section 5 has the results for the SystemC simulations; and finally, in Section 6, the conclusion. 2. State of the Art This section presents some of the current interconnection systems used in the literature.

2 2.1. Communication in Reconfigurable Architectures The FPGAs are still the most famous example of RAs, to the point of some people think they are the only kind of RAs existent. These architectures use mostly global and local interconnections. A perfect example is Altera s FLEX 10K family [2]. The communication system in FPGAs is based on crossbar structures that form a static connection until the next reconfiguration. Most finegrained RA s make use of point-to-point connections between neighboring PEs (Processing Elements) [3, 4, 5]. Current coarse-grained architectures have as much communication demands as a multiprocessor environment. As an example of coarse-grained RA we have the PACT XPP. It uses vertical and horizontal buses for transmitting data packets. For the transmission of control packets, the XPP has a single shared bus for each PAC (coarse structure containing a grid of RUs) Networks-on-Chip In the near future a single chip is expected to contain billions of transistors [6], which means probably hundreds of cores in a single chip. However, none of the current RAs use NoCs or any similar device for communication, as far as we know. The Network-on-Chip [7, 8, 9] is a network communication structure composed of many interconnected devices called routers. In direct topology networks, the routers are associated to cores, or to RUs in this case, and transmit packets sent from that core to other routers, until the packet reaches its destination. The router also transmits packets sent to its associated core. There are 3 major reasons why NoCs should be used in a multi-core microchip. First of all, because of its performance. Although a dedicated channel structure has better performance, it is virtually unviable in a system with hundreds of cores. The high performance of a NoC is due to its parallel and pipelined communication. Second, because of its scalability. Unlike other communication structures, a NoC can be easily adapted to any number of cores. In addition, the performance will not be affected. Third, reuse. The same NoC can be used in a diversified range of architectures without adaptation, as long as the cores and the NoC have the same communication protocol. That saves time in development and testing, and allows a faster design space exploration. However, the use of NoCs is not without its drawbacks. In a direct topology, where every core is connected to a router, the area overhead for a NoC interconnection system is given by the overhead of a single router multiplied by the number of cores. The pipelined communication of the NoCs adds latency to messages, which may be too high depending on the distance between the communication cores. Also, failing to choose the right routing, buffering and switching mechanisms might cause the system to suffer from performance loss to total stalling due to a deadlock. In spite of its drawbacks, NoCs rise as the only feasible communication model for large systems. A model of full interconnection between a core and its neighbors not only adds a large port overhead to cores, but is also inefficient if messages are sent to distant cores. A bus provides freelatency communication, but its scalability is limited to only dozens cores [10]. A bus system needs bridges between buses, which adds area overhead, latency and performance loss for distant cores communication. 3. An On-Chip Reconfigurable Multiprocessor Architecture The Reconfigurable Architecture used for the combination with Network-on-Chip is the X4CP32 [11, 12, 13, 14]. The X4CP32 architecture is composed by two basic elements: the cell and the RPU (Reconfigurable and Programming Unity). Each RPU contains 4 cells, which grants it the autonomy to keep its own program flow. The processor is a grid of RPUs, distributed in rows and columns, as seen in Figure 1. Every RPU is connected to all other RPUs in the same row and column through a bus. This bus also connects the RPUs to the external memory. Figure 1 RPU connections

3 3.1. The RPU The RPU is the highest-level entity in X4CP32. It is responsible for its reconfiguration and parallel processing. The RPU consists of 4 cells, an internal memory (G- MEM), an Instruction Memory (I-MEM), a Communication Buffer, an internal bus, a Bus Arbiter and a control logic. The I-MEM is located inside the top-left Cell. Each RPU is connected to its neighbors through its cells neighbor connections. It is also connected to every other RPU in the same row and column through the Row/Column buses. These connections can be seen in Figure 1. Figure 2 shows the RPU architecture. The G-MEM is the RPU memory block with 64k postions and 32-bit long. All components from RPU can read/write data from/to G-MEM. Bus Arbiter is the access controller of the Internal Bus. The Bus Arbiter manages the Internal Bus access protocol and priorities. The Communication Buffer is the interface to the inter-rpu communication (communication with other RPUs and the main memory). parallel processor. The top left Cell assumes the Processor Operation Mode. The other Cells assume the Dynamic ALU Operation Mode, to execute the instructions sent from the top left Cell. Thus, the RPU has 3 parallel processing units controlled by the other one. A compiler can implement a methodology to find the better way to distribute the instructions among the Dynamic ALU Cells, exploiting the architecture to its best. In the Reconfigurable Execution Mode, the RPU sets all of its Cells to Static ALU Operation Mode and configures each Cell inputs, operations, outputs and routings building a systolic data path, just like the usual reconfigurable architectures. When the inputs are ready, the Cell operates them and writes the result in the output port, so a neighbor Cell can read it. Figure 2 RPU internal view 3.2. Execution Modes The Execution Mode is the way X4CP32 achieves its hybridism. The Execution Mode defines the behavior of the RPU. There are two Execution Modes: the Programming Execution Mode and the Reconfigurable Execution Mode. In Programming Execution Mode, the RPU acts as a 3.3. The Cell Figure 3 Cell data path The cell is a microprocessor with an instruction set common to most general-purpose microprocessors, plus a few configuration specific instructions. It is connected to 5 other cells through half-duplex buses. These connections are called ports. The cell also has an ALU,

4 capable of performing basic logic and arithmetic operations in fixed and floating-point numbers, a 32- position register bank (C-MEM) and a 64 positions stack, C-LIFO. Figure 3 shows an example of the cell data path. The synthesis results of the cell are presented in Table 1. Table 1 Synthesis Results of the Cell Unit Clock (Mhz) Area (# LC) ALU 70 2,307 Cells 50 3,739 PC Cell 50 4,256 PC Cell 59 1,404 Controller Cells Controller RPU 49 15, Operation Modes To implement the hybrid runtime-reconfigurable/ parallel paradigm, the Cell was planned to match the specifications of both paradigms. It can operate in three different modes, which are the key for having the hybridism desired. In Processor Operation Mode, the Cell is the program flow control unit. The Cell fetches instructions in the I- MEM and checks to which Cell the instruction is for. If the instruction is for another Cell, it sends the instruction through a predefined port to the destination Cell. If it s not for another Cell, it executes the instruction. While in Dynamic ALU Operation Mode, the Cell receives the instructions through a predefined port, which depends on its position in the RPU. It executes the instruction and waits for another one. In Static ALU Operation Mode, the Cell is set to work as a data-driven ALU. The operation, inputs (ports, ACC or a special memory position) and output (a port) are set and the Cell operates whenever there is a valid data, until it receives another reconfiguration instruction. a RPU will receive data in the same order they were originally generated. This need is of easy understanding, because since this is a parallel/reconfigurable environment, every time a RPU needs a data, it has to make sure the data that RPU receives is the same it actually needs for its correct execution. That could be archived in a number of ways, such as labeling every data, but that would take more bits to store and transmit. The R2R Protocol eliminates this necessity. To run R2R Protocol doesn t require extra processing from the RPU. The Communication Buffer runs all the protocol and works independently of the RPU control. However, with the increasing popularity of SoCs (System-on-Chip) [17, 18], and possibly Reconfigurable SoCs, it s necessary to evaluate the utilization of NoC in coarse-grained or polymorphous reconfigurable architectures NoCX4 The NoCX4 [19] is a Network-on-Chip especially designed to be integrated with the X4CP32. It consists, as several NoC models, of a router grid. Each router has 5 ports: north, south, east and west, which link the router to four neighboring routers; and the node port, which connects the router to a module, in this case the RPU. The NoCX4 uses XY routing, for its simplicity and deadlock prevention, despite its drawbacks. The packets are composed of a word-size (32-bit long) header and from 1 to 8 word-size data, as seen if Figure X4CP32 s Interconnection 4.1. Bus System (R2R) The RPUs in the same row and in the same column are all connected through buses. They communicate by using a simple, peer-to-peer oriented protocol, the R2R (RPUto-RPU). A SystemC [15] implementation of the protocol and the results obtained can be seen in [16]. The fundamental principle of the R2R is to assure that Figure 4 NoCX4 Packet The NoCX4 s Router is simple, with only a few internal modules. It has 5 special FIFOs, which perform all the control flow and routing functions. The FIFO has bit positions buffer and a 5-state control logic. The Crossbar allows 5 different data to be routed at the same time, because of its internal connections. Every

5 output port has an associated Arbiter. The Arbiters use a simple Round-Robin scheduling policy, with no priority. An arbiter evaluates how many available positions the FIFO has and the data size of all requesting FIFOs before it grants the datapath. The Router architecture is shown in Figure 5. The Arbiters are not shown in the figure for spatial matters, but they are associated to every multiplexer. Each FIFO is connected to all output port, except its own, because a packet never needs to return to where it came from. Figure 6 Simulations Communication Parameters Figure 5 NoCX4 Router Architecture 5. Simulations and Results For the comparison of different communication models in the X4CP32 it was run a set of simulations in SystemC. The simulations were performed in a 3x3 RPU grid. Because of this paper s scope, the RPU was replaced by a Communication Buffer encapsulated by a wrapper, which provided the communications instructions from the simulation files. The input simulation files were randomly generated according to some specific required parameters, show in Figure 6. Each square represents a Wrapper. The Send field has information on the number of peers received instructions from that Wrapper; the Receive field informs the number of Wrappers that generated data for that Wrapper; and the Route field informs how many routing paths crossed the router linked to that Wrapper. Notice that the data generated in the router s Wrapper, or the ones received by it, are not counted in the Route field. The number in parenthesis represents the percentage of sent, received and routed data over the total simulation dataflow. Two versions of NoCX4 were simulated. Both have the same characteristics, expect for their buffering. The first version uses a single 32-position deep FIFO in each input port. The second version had its FIFOs replaced by four 8-position deep SAFC (Statically-Allocated Fully Connected) buffers. Since both buffering have the same number of positions, their areas are equivalent, as can be seen in Table 2, that represents the logic cells cost and frequency operation of the FIFO and SAFC models. The implementation used the VHDL [20] language and FPGAs. During the compilation the software Quartus II, from Altera, was utilized. Despite of te equivalent area of the chip, the results show that the introduction of virtual channels have a major impact on the NoCX4 performance. Table 3 presents the synthesis results of the router, describing the number of logic cells needed by each unit and its frequency. The last row of the table indicates the total cost for a router with five buffers SAFC and all other elements of the table. Table 2 Area cost and Frequency of Buffering Models Buffer Number of LCs Frequency (MHz) FIFO ,92 SAFC ,6

6 Table 3 Area cost and Frequency of the Router Unit Number of LCs Frequency (MHz) Bufferering (SAFC) Keying Routing Flux Control 1, Handshake 10 - Router (Total) 5, A total of 15 simulations were performed for each communication model. The traffic rate varies from 5% up to 25%, and 3 different packet sizes were simulated: 2, 4 and 8. The traffic rate is a percentage of 10,000 instructions for each Wrapper. This means in the simulation of 25% traffic rate and packet size 8, there are 22,500 packets. The traffic rate can be translated as 18kBytes per 1 packet size per 5%. The throughput results, in bytes/cycle, for the 3 communication models can be seen in Figure 7. These results only consider the payload of the packet. The R2R results, represented as the BUS, never reach a rate of 1 byte/cycle. It maintains an almost constant throughput rate in all cases, with a slight decrease as the packet size increases. The FIFO version of NoCX4 shows a linear increase until a certain saturation point, which varies depending on the packet size. After that point on, the throughput decreases. The SAFC version of NoCX4 shows similar results to the FIFO, with one important difference: its saturation point is usually after the FIFO s saturation. With that, the SAFC allows the throughput rate to increase to the highest rates of the 3 models. At 20% traffic rate the FIFO PCK 8 and the SAFC PCK 4 curves come across. Despite having twice as much overhead due to the header, and packet half as big, the use of virtual channels allows much higher throughput rates. A better comparison between NoCX4 and R2R is in Figure 8, which shows the results of ending cycles. Ending cycles is the number of cycles necessary for sending all the data, thus ending the simulation. Here only one version of NoCX4, the FIFO, appears. Showing the SAFC version s results would hamper the graphic s visibility. The results are coherent to those seen in Figure 7. The instructions were distributed over 50,000 cycles, so a result over that amount means saturation. The saturation point for the FIFO is the same seen in throughput results, but only here the R2R results are visible. Unlike NoCX4, the R2R saturates almost at the 5% traffic rate, presenting a linear increase in the ending cycles, proportional to the amount of data in the simulation. At the higher traffic rates, the R2R takes 7 times more than the FIFO to finish the data transmission, and almost 9 times more than the SAFC % 10% 15% 20% 25% BUS PCK 2 FIFO PCK 2 SAFC PCK 2 BUS PCK 4 FIFO PCK 4 SAFC PCK 4 BUS PCK 8 FIFO PCK 8 SAFC PCK 8 0 Figure 7 Throughput (bytes/cycle) 5% 10% 15% 20% 25% FIFO PCK2 BUS PCK 2 FIFO PCK 4 BUS PCK 4 FIFO PCK 8 BUS PCK 8 Figure 8 Ending Cycles Finally, we have a comparison between the two versions of NoCX4 in Figure 9. The results for average latency are measured in cycles. Since buses have a constant latency, the R2R results were not included there. With a 5% traffic load, the results are exactly the same. That s because there was no saturation, so the time to

7 send the packets is the same. At 10% traffic load and packet size of 8, the FIFO reaches its saturation point. From that point on the average latency grows larger. However, for the SAFC, even when it reaches its saturation point, there is still a good flowing of data, because of the virtual channels. That gives the SAFC version a controlled latency, even for 25% traffic rate, and has a major impact on the performance. Since both versions of NoCX4 use the same routing mechanism, policy schedule and so on, the better performance of SAFC version is due to its reduced latency when compared to the FIFO version % 10% 15% 20% 25% FIFO PCK 2 SAFC PCK 2 FIFO PCK 4 SAFC PCK 4 FIFO PCK 8 SAFC PCK 8 Figure 9 Average Latency (cycles) 6. Conclusion and Future Works This paper showed a study on different communication models for an on chip reconfigurable multiprocessor, the X4CP32. A bus system, the R2R, and a Network-on-Chip, NoCX4, were developed for this work. The NoCX4 was implemented in two versions, which had the same characteristics except for their buffering. SystemC simulations of the 3 communication models were performed, and the results, throughput, average latency, and ending cycles, were presented and discussed. In all results, the R2R has proven to be the worst alternative. The single changing of buffering model in NoCX4 had a major impact on the results, showing that the use of virtual channels decreases the average latency, thus improving the final throughput of the NoC. As for future works, it will be designed a new hybrid model, using both bus and NoC, and a cache hierarchy development for the X4CP References [1] ARM AMBA, [2] Altera, [3] E. Mirsky, A. DeHon: MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources ; Proc. IEEE FCCM 96, Napa, CA, USA, April 17-19, [4] C. Ebeling et al.: RaPiD: Reconfigurable Pipelined Datapath. In; sixth International Workshop on Feld Programmable Logic and Compilers, Lecture Notes on Computer Science, pp Springer-Verlag, September [5] R. Kress et al.: A Datapath Synthesis System for the Reconfigurable Datapath Architecture; ASDP-DAC 95, Chiba, Japan, Aug. 29 Sept. 1, [6] International Technology Roadmap for Semiconductors, [7] L. Benini and G. De Micheli. Networks on Chips: a New SoC Paradigm, IEEE Computer, Jan. 2002, pp [8] Kumar, S. et al; A network on chip architecture and design methodology VLSI, Proceedings. IEEE Computer Society Annual Symposium on, pp , [9] Jantsch, A. and Tenhunen, H. (Editors); Networks on Chip ; Kluwer Academic Publishers, [10] Kai Hwang Advanced Computer Architecture: Parallelism, Scalability, Programmability McGraw-Hill, Inc [11] A. Azevedo, R. Soares and I. S. Silva. A New Hybrid Parallel/Reconfigurable Architecture: The X4CP32. In Proceedings of the 16th Symposium on Integrated Circuits and Systems Design. SBCCI 03. ACM Press, [12] R. Soares, A. Pereira, I. Saraiva, X4CP32: a Programmable Multi-level Reconfigurable Microprocessor, Proceedings of Students Forum on Microeletronics 02, SBC, Porto Alegre, pp , [13] R. Soares, A. Pereira, I. Saraiva, X4CP32: A Coarse Grain General Purpose Reconfigurable Microprocessor, RAW 03, Nice, France, [14] A. Pereira, R. Soares, I. Saraiva, Implementação da DCT 2D em arquiteturas reconfiguráveis utilizando a X4CP32, Proceedings of Iberchip 03, Havana, Cuba, [15] Open SystemC Initiative: [16] R. Soares, A. Pereira, I. Saraiva, A Case-Study of Communication in a Reconfigurable Architecture: The X4CP32 s Communication Buffer, Proceedings of Iberchip 04, Cartagena des Indias, Colombia, [17] R. A. Bergamaschi and J. Cohn. The A to Z of SoCs. In Proceedings of the 2002 IEEE/ACM international Conference on Computer-Aided Design. ICCAD 02. ACM Press, , [18] Benini, L.; et al. MPARM: Exploring the Multi-Processor SoC Design Space with SystemC. The Journal of VLSI Signal Processing, 41(2): , September 2005.

8 [19] R. Soares, I. S. Silva and A. Azevedo. When reconfigurable architecture meets network-on-chip. In Proceedings of the 17th Symposium on integrated Circuits and System Design. SBCCI 04. ACM Press, [20] D. Pellerin and D. Taylor. VHDL made easy!, Prentice- Hall, Inc., 1997.

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

RASoC: A Router Soft-Core for Networks-on-Chip

RASoC: A Router Soft-Core for Networks-on-Chip RASoC: A Router Soft-Core for Networks-on-Chip Cesar Albenes Zeferino Márcio Eduardo Kreutz Altamiro Amadeu Susin UNIVALI CTTMar Rua Uruguai, 458 C.P. 360 CEP 88302-202 Itajaí SC BRAZIL zeferino@inf.univali.br

More information

Packet-driven General Purpose Instruction Execution on Communication-based Architectures

Packet-driven General Purpose Instruction Execution on Communication-based Architectures Packet-driven General Purpose Instruction Execution on Communication-based Architectures Sílvio R. Fernandes 1, Ivan S. Silva 2 and Marcio Kreutz 3 1 Departamento de Ciências Exatas e Naturais, Universidade

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC BWCCA 2010 Fukuoka, Japan November 4-6 2010 Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Reconfigurable Routers for Low Power and High Performance Débora Matos, Student Member, IEEE, Caroline Concatto, Student Member, IEEE,

More information

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM

LOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Implementation of PNoC and Fault Detection on FPGA

Implementation of PNoC and Fault Detection on FPGA Implementation of PNoC and Fault Detection on FPGA Preethi T S 1, Nagaraj P 2, Siva Yellampalli 3 Department of Electronics and Communication, VTU Extension Centre, UTL Technologies Ltd. Abstract In this

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

Efficient And Advance Routing Logic For Network On Chip

Efficient And Advance Routing Logic For Network On Chip RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design 1 Reconfigurable Computing Platforms 2 The Von Neumann Computer Principle In 1945, the

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

A Literature Review of on-chip Network Design using an Agent-based Management Method

A Literature Review of on-chip Network Design using an Agent-based Management Method A Literature Review of on-chip Network Design using an Agent-based Management Method Mr. Kendaganna Swamy S Dr. Anand Jatti Dr. Uma B V Instrumentation Instrumentation Communication Bangalore, India Bangalore,

More information

Study of Network on Chip resources allocation for QoS Management

Study of Network on Chip resources allocation for QoS Management Journal of Computer Science 2 (10): 770-774, 2006 ISSN 1549-3636 2006 Science Publications Study of Network on Chip resources allocation for QoS Management Abdelhamid HELALI, Adel SOUDANI, Jamila BHAR

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

THE IMPLICATIONS OF REAL-TIME BEHAVIOR IN NETWORKS-ON-CHIP ARCHITECTURES

THE IMPLICATIONS OF REAL-TIME BEHAVIOR IN NETWORKS-ON-CHIP ARCHITECTURES THE IMPLICATIONS OF REAL-TIME BEHAVIOR IN NETWORKS-ON-CHIP ARCHITECTURES Edgard de Faria Corrêa 1,2, Eduardo Wisnieski Basso 1, Gustavo Reis Wilke 1, Flávio Rech Wagner 1, Luigi Carro 1 1 Instituto de

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

Design and Analysis of On-Chip Router for Network On Chip

Design and Analysis of On-Chip Router for Network On Chip Design and Analysis of On-Chip Router for Network On Chip Ms. A.S. Kale #1 M.Tech IInd yr, Electronics Department, Bapurao Deshmukh college of engineering, Wardha M. S.India Prof. M.A.Gaikwad #2 Professor,

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

On Packet Switched Networks for On-Chip Communication

On Packet Switched Networks for On-Chip Communication On Packet Switched Networks for On-Chip Communication Embedded Systems Group Department of Electronics and Computer Engineering School of Engineering, Jönköping University Jönköping 1 Outline : Part 1

More information

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE

Abstract A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE A SCALABLE, PARALLEL, AND RECONFIGURABLE DATAPATH ARCHITECTURE Reiner W. Hartenstein, Rainer Kress, Helmut Reinig University of Kaiserslautern Erwin-Schrödinger-Straße, D-67663 Kaiserslautern, Germany

More information

Transaction Level Model Simulator for NoC-based MPSoC Platform

Transaction Level Model Simulator for NoC-based MPSoC Platform Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits & Systems, Hangzhou, China, April 15-17, 27 17 Transaction Level Model Simulator for NoC-based MPSoC Platform

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

Reconfigurable Computing. On-line communication strategies. Chapter 7

Reconfigurable Computing. On-line communication strategies. Chapter 7 On-line communication strategies Chapter 7 Prof. Dr.-Ing. Jürgen Teich Lehrstuhl für Hardware-Software-Co-Design On-line connection - Motivation Routing-conscious temporal placement algorithms consider

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Design of an Efficient Communication Protocol for 3d Interconnection Network

Design of an Efficient Communication Protocol for 3d Interconnection Network Available online at: http://www.ijmtst.com/vol3issue10.html International Journal for Modern Trends in Science and Technology ISSN: 2455-3778 :: Volume: 03, Issue No: 10, October 2017 Design of an Efficient

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Fitting the Router Characteristics in NoCs to Meet QoS Requirements

Fitting the Router Characteristics in NoCs to Meet QoS Requirements Fitting the Router Characteristics in NoCs to Meet QoS Requirements Edgard de Faria Corrêa Superintendência de Informática - UFRN edgard@info.ufrn.br Leonardo A.de P. e Silva lapys@inf.ufrgs.br Flávio

More information

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution Nishant Satya Lakshmikanth sailtosatya@gmail.com Krishna Kumaar N.I. nikrishnaa@gmail.com Sudha S

More information

Design of network adapter compatible OCP for high-throughput NOC

Design of network adapter compatible OCP for high-throughput NOC Applied Mechanics and Materials Vols. 313-314 (2013) pp 1341-1346 Online available since 2013/Mar/25 at www.scientific.net (2013) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/amm.313-314.1341

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

The Xilinx XC6200 chip, the software tools and the board development tools

The Xilinx XC6200 chip, the software tools and the board development tools The Xilinx XC6200 chip, the software tools and the board development tools What is an FPGA? Field Programmable Gate Array Fully programmable alternative to a customized chip Used to implement functions

More information

Low Cost Network on Chip Router Design for Torus Topology

Low Cost Network on Chip Router Design for Torus Topology IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.5, May 2017 287 Low Cost Network on Chip Router Design for Torus Topology Bouraoui Chemli and Abdelkrim Zitouni Electronics

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Traffic Generation and Performance Evaluation for Mesh-based NoCs

Traffic Generation and Performance Evaluation for Mesh-based NoCs Traffic Generation and Performance Evaluation for Mesh-based NoCs Leonel Tedesco ltedesco@inf.pucrs.br Aline Mello alinev@inf.pucrs.br Diego Garibotti dgaribotti@inf.pucrs.br Ney Calazans calazans@inf.pucrs.br

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

Design and Implementation of A Reconfigurable Arbiter

Design and Implementation of A Reconfigurable Arbiter Proceedings of the 7th WSEAS International Conference on Signal, Speech and Image Processing, Beijing, China, September 15-17, 2007 100 Design and Implementation of A Reconfigurable Arbiter YU-JUNG HUANG,

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections ) Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections 5.1-5.3) 1 Reducing Miss Rate Large block size reduces compulsory misses, reduces miss penalty in case

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

ISSN:

ISSN: 113 DESIGN OF ROUND ROBIN AND INTERLEAVING ARBITRATION ALGORITHM FOR NOC AMRUT RAJ NALLA, P.SANTHOSHKUMAR 1 M.tech (Embedded systems), 2 Assistant Professor Department of Electronics and Communication

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

R.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp.

R.W. Hartenstein, et al.: A Reconfigurable Arithmetic Datapath Architecture; GI/ITG-Workshop, Schloß Dagstuhl, Bericht 303, pp. # Algorithms Operations # of DPUs Time Steps per Operation Performance 1 1024 Fast Fourier Transformation *,, - 10 16. 10240 20 ms 2 FIR filter, n th order *, 2(n1) 15 1800 ns/data word 3 FIR filter, n

More information

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip

A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip 2010 25th International Symposium on Defect and Fault Tolerance in VLSI Systems A Strategy for Interconnect Testing in Stacked Mesh Network-on- Chip Min-Ju Chan and Chun-Lung Hsu Department of Electrical

More information

Multi processor systems with configurable hardware acceleration

Multi processor systems with configurable hardware acceleration Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations

More information

Operating System Support for IPNoSys

Operating System Support for IPNoSys Operating System Support for IPNoSys Silvio Roberto Fernandes de Araújo Universidade Federal Rural do Semiárido, Dep. de Ciências Exatas e Naturais, Mossoró, Brazil, 59625-900 silvio@ufersa.edu.br and

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Design and Simulation of Router Using WWF Arbiter and Crossbar

Design and Simulation of Router Using WWF Arbiter and Crossbar Design and Simulation of Router Using WWF Arbiter and Crossbar M.Saravana Kumar, K.Rajasekar Electronics and Communication Engineering PSG College of Technology, Coimbatore, India Abstract - Packet scheduling

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC

DESIGN AND IMPLEMENTATION ARCHITECTURE FOR RELIABLE ROUTER RKT SWITCH IN NOC International Journal of Engineering and Manufacturing Science. ISSN 2249-3115 Volume 8, Number 1 (2018) pp. 65-76 Research India Publications http://www.ripublication.com DESIGN AND IMPLEMENTATION ARCHITECTURE

More information

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 2, Number 4 (August 2013), pp. 140-146 MEACSE Publications http://www.meacse.org/ijcar DESIGN AND IMPLEMENTATION OF VLSI

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

Two-level Reconfigurable Architecture for High-Performance Signal Processing

Two-level Reconfigurable Architecture for High-Performance Signal Processing International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA 04, pp. 177 183, Las Vegas, Nevada, June 2004. Two-level Reconfigurable Architecture for High-Performance Signal Processing

More information

BISTed cores and Test Time Minimization in NOC-based Systems

BISTed cores and Test Time Minimization in NOC-based Systems BISTed cores and Test Time Minimization in NOC-based Systems Érika Cota 1 Luigi Carro 1,2 Flávio Wagner 1 Marcelo Lubaszewski 1,2 1 PPGC - Instituto de Informática 2 PPGEE - Depto. Engenharia Elétrica

More information

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks

Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links John Wawrzynek, Krste Asanovic, with John Lazzaro and Yunsup Lee (TA) UC Berkeley Fall 2010 Unit-Transaction Level

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali CSL, Stanford University Stanford, USA smurali@stanford.edu Martijn Coenen, Andrei Radulescu, Kees Goossens Philips

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol

A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol SIM 2011 26 th South Symposium on Microelectronics 167 A Direct Memory Access Controller (DMAC) IP-Core using the AMBA AXI protocol 1 Ilan Correa, 2 José Luís Güntzel, 1 Aldebaro Klautau and 1 João Crisóstomo

More information

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Zhonghai Lu Dept. of Electronic Systems School for Information and Communication Technology KTH - Royal Institute of Technology, Stockholm

More information

Dynamic Router Design For Reliable Communication In Noc

Dynamic Router Design For Reliable Communication In Noc Dynamic Router Design For Reliable Communication In Noc Mr. G.Kumaran 1, Ms. S.Gokila, M.E., 2 VLSI Design, Electronics and Comm. Department, Pavai College of Technology, Pachal, Namakkal District, India

More information

Akash Raut* et al ISSN: [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3,

Akash Raut* et al ISSN: [IJESAT] [International Journal of Engineering Science & Advanced Technology] Volume-6, Issue-3, A Transparent Approach on 2D Mesh Topology using Routing Algorithms for NoC Architecture --------------- Dept Electronics & Communication MAHARASHTRA, India --------------------- Prof. and Head, Dept.Mr.NIlesh

More information

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression

Pipelined Fast 2-D DCT Architecture for JPEG Image Compression Pipelined Fast 2-D DCT Architecture for JPEG Image Compression Luciano Volcan Agostini agostini@inf.ufrgs.br Ivan Saraiva Silva* ivan@dimap.ufrn.br *Federal University of Rio Grande do Norte DIMAp - Natal

More information

Coupling MPARM with DOL

Coupling MPARM with DOL Coupling MPARM with DOL Kai Huang, Wolfgang Haid, Iuliana Bacivarov, Lothar Thiele Abstract. This report summarizes the work of coupling the MPARM cycle-accurate multi-processor simulator with the Distributed

More information