PNoC: a flexible circuit-switched NoC for FPGA-based systems

Size: px
Start display at page:

Download "PNoC: a flexible circuit-switched NoC for FPGA-based systems"

Transcription

1 FIELD PROGRAMMABLE LOGIC AND APPLICATIONS PNoC: a flexible circuit-switched NoC for FPGA-based systems C. Hilton and B. Nelson Abstract: Increases in chip density due to Moore s law allow for the implementation of ever larger and more complex systems on a single chip (SoCs). The communication mechanisms employed in such SoCs are an important contribution to their overall performance. Networks on chip (NoCs) promise to overcome the scalability problems found in bus-based interconnect. To date, most work has focused on packet-switched NoCs. Circuit-switched networks are an intriguing alternative, which promise high communication rates and predictable communication latencies. A new lightweight circuit-switched architecture called programmable NoC (PNoC) is described. PNoC is a flexible architecture that is suitable for use in FPGA-based systems. Implementation results on a Virtex-II Pro device are given using an image binarisation demonstration which resulted in as much as a 23 speedup compared with a shared bus implementation. 1 Introduction Increases in chip density due to Moore s law allow for the implementation of ever larger systems on a single chip. Known as systems on chip (SoCs), these systems usually contain a mixture of CPUs, memories and custom hardware modules. Such SoCs can also be implemented on FPGA substrates, something we will refer to as programmable SoCs (PSoCs), in this paper. The inter-module communication mechanisms employed on SoCs and PSoCs have recently received significant attention for at least two reasons. First, traditional bus-based communication mechanisms do not scale well with increasing system complexity and become a bottleneck as system complexities continue to increase. Second, design and verification times for complex systems continue to grow, that is, the desire for efficiencies in design and verification methodologies argues for standardised communication mechanisms instead of ad hoc direct module interconnections. Shared buses such as ARM s AMBA bus [1] and IBM s CoreConnect [2] are commonly used communication mechanisms in SoCs and PSoCs. They support a modular design approach that uses standard interfaces and allows for IP re-use [3], but the bus is often the performance bottleneck in a large system. Both Xilinx and Altera support a hybrid bus/direct-interconnect architecture that allows for direct module-to-module connections in addition to the bus interconnect. Hybrid approaches scale better than purely bus-based schemes, but complicate the design process because they reduce the modularity of the system and # The Institution of Engineering and Technology 2006 IEE Proceedings online no doi: /ip-cdt: Paper first received 31st October 2005 and in revised form 8th February 2006 C. Hilton is with Rincon Research Corporation, 101 N. Wilmot, Suite 101, Tucson, AZ 85711, USA B. Nelson is with Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602, USA clint.hilton@gmail.com require custom hardware design for the module-to-module connections. Another alternative would involve the use of multiple buses or bus segments to alleviate the load on the main bus. This would allow for local communication between modules on the same bus segment without causing congestion to the rest of the bus. The disadvantages to this approach are its reduced flexibility and scalability, and its complication of the design process. Various network interconnect approaches have been proposed for SoCs and PSoCs [4 7]. Networks scale better and promise higher communication bandwidth than buses. Like buses, they allow the re-use of standard interface modules for connecting circuit nodes to the network. Network architectures can be divided into two categories, packet-switched and circuit-switched. In a packet-switched approach, the data are broken into packets, each of which contains routing information. These packets are injected into the network where they are independently routed to the desired destination. Packet-switched networks often allow for high aggregate system bandwidth, as many packets can be in flight at a given instant. However, they generally require congestion control and packet processing, which includes buffers to queue-up packets awaiting the availability of the routing resources. In contrast, with circuit switching, a dedicated connection path (a virtual circuit) between two nodes is established before communication takes place. Once the virtual circuit is established, raw data can be freely transferred with very low overhead between the modules until the virtual circuit is no longer needed, at which time it can be closed. Circuit-switched networks require no overhead for packetisation, packet header processing or packet buffering. Once the virtual circuit is established, accessing data across a circuit-switched connection is no more difficult than accessing a synchronous memory (the requester sends an address and receives the corresponding data in return after a delay of a few clock cycles). As a result, the circuitry required for a circuit-switched network is relatively simple and appropriate for use in even small systems. The flexibility of the proposed approach makes it suitable in a IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May

2 variety of topologies from rings to meshes to irregular structures. Additionally, achieving very close to the peak bandwidth between modules is easily achieved; in the example presented later, a sustained data transfer rate of 96% of the peak rate was achieved. Two problems associated with circuit switching have been mentioned in the past as shortcomings. First, setup latency, the time required to build a virtual circuit, must be incurred before any communication between nodes can take place. In the system described here, efforts were made to minimise this circuit establishment latency through the use of simple communication protocols. The second problem involves idle time on communication links, this will result when connections have been established but no transfers are taking place. This is not a major concern in our system: opening and closing connections are lightweight enough that there is little motivation for nodes to monopolise communication links by leaving them open for long periods of time. A vast majority of the proposed network approaches have involved packet switching architectures. One such example that is fairly representative is the CLICHE architecture [5]. It is a fixed 2D mesh with one routing switch for each compute node as shown in Fig. 1. Although this architecture is highly scalable, the authors concede that their architecture is unsuitable for certain heavy dataflow applications for performance reasons. As another example, the work of Marescaux et al. [7] is targeted specifically to FPGAs. It is a 2D torus architecture that performs packet switching, using wormhole routing. Of particular interest, it uses partial reconfiguration of the FPGA to support run-time dynamic module replacement. Among the few references to circuit switching in NoC design, Liu and coworkers [8, 9] both provide strong arguments for its advantages over packet switching in NoC-based systems. The architecture proposed by Liu et al. [8] is a time-division-multiplexed central-switched network (crossbar) shared by all communicating nodes. SoCBUS [9] is a circuit-switched NoC organised as a fixed 2D mesh and includes a routing switch for every compute node. Both of these references perform detailed simulations of circuit-switched NoCs to show their throughput and relative advantages over packet-switched NoCs. The reader is directed to these references for further comparisons of circuit-switched and packet-switched networks. Unfortunately, very few of the previous circuit-switched NoC papers provided any kind of implementation or performance data that could be used for relevant comparison against this work. In this work, we describe the detailed implementation of a new circuit-switched NoC designed specifically for FPGA-based systems. The flexible and lightweight advantages of this architecture are explained as we proceed to quantify its area and performance benefits. The main motivation for us chosing a circuit-switched network over a packet-switched one is its ability to maintain guaranteed throughput between nodes connected via a virtual circuit. This is in direct contrast to packet-switched techniques, where significant variations in communication latency are often possible. This ability to provide reliable high data rates between the nodes in a system that most need it was critical in this design decision. The topological flexibility of this system, described in the following sections, also distinguishes it from previous work, which has been limited to regular mesh or central crossbar architectures. Our proposed network, PNoC, is designed with three goals in mind. First, we wanted it to be a flexible networking approach that would be applicable to a wide variety of system requirements. Flexibility was desired for both the allowable network topologies as well as the communication datapath widths. In this way, our work differs significantly from that reported by Liu and coworkers [8, 9], which focus solely on crossbars [8] and meshes [9]. Second, we wanted a network that simplified system design by providing simple, standard network interfaces and easily understood network protocols. Third, we wanted our network to be lightweight, requiring few FPGA resources, and thus suitable for both small and large FPGA-based systems. 2 PNoC: circuit-switched NoC for use in FPGA-based systems PNoC was designed to be extremely flexible. At design time, it is possible to easily construct a variety of network architectures each with its own mix of system routing and computational resources. In addition, the network modules are parameterised for communication path widths, flow control and timeout handling. At runtime, PNoCs flexibility supports the dynamic removal and insertion of nodes in the system, if supported by the FPGA fabric (PNoC provides support for dynamic module replacement via routing table updates. However, the creation of partial reconfiguration bitstreams is outside the scope of this paper.). The proposed network topology consists of a series of subnets, in which each contains a router and a collection of network nodes similar to that shown in Fig. 2. This style of topology was chosen because of the ability to place modules that communicate frequently in the same subnet, allowing even more efficient overall system communication. The routers perform the circuit switching between the nodes, and each node connects to a single router through a router port interface. A lightweight handshaking mechanism is used to establish dedicated connections between nodes, to exchange data and to remove connections. The signals required in this circuit-switched communication are described in Table 1. The naming convention is from the node s perspective. That is, signals with a direction Fig. 1 CLICHE architecture Fig. 2 Example PNoC topology 182 IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May 2006

3 Table 1: PNoC communication signals Q2 Signal Direction Description request out initiates either a router update request or a connection request release out initiates a connection release grant in indicates to the network node that its connection request has been granted sl_grant in indicates that a connection has been established with this node as a slave pend in indicates that another node is requesting access to this node s current destination port rx_data[x:0] in rx data bus of parameterisable width rx_addr[y:0] in rx address bus of parameterisable width rx_rnw in rx read-not-write signal rx_valid in indicates valid rx data, address and rnw signals rx_cts in rx clear-to-send signal tx_data[x:0] out tx data bus of parameterisable width tx_addr[y:0] out tx address bus of parameterisable width tx_rnw out tx read-not-write signal tx_valid out indicates valid tx data, address and rnw signals tx_cts out tx clear-to-send signal of in are inputs to the node (and thus are outputs from the router). The single-bit control signals in the table deal with router table updates, requests to create or destroy virtual circuits and read/write requests. In addition to these signals, each module has a set of receive (rx) and transmit (tx) signals consisting of at least address and data lines. The transmit address lines are interesting in that they serve multiple purposes. When creating virtual circuits they specify the ID (or address) of the module to which a virtual connection is desired. Once a virtual connection has been established and read and write transactions begin, they are then used to specify an address in the remote module s address space to which the transaction refers, contributing to the flexibility of the system. Finally cts signals are used for flow control as will be described subsequently. 2.1 Router The router is the core of this network communication architecture. The major components that make up the PNoC router are shown in the block diagram of Fig. 3. The function of each of these components is described as follows. Table Arbiter. The table arbiter receives connection requests and schedules access to the routing table in the Fig. 3 Router block diagram case that multiple requests are received on the same clock cycle. This block is also responsible for managing the routing table update requests. Routing table. The routing table maps network module addresses to ports that may be used to establish connections between modules. The node addresses serve as the index to the table and the data stored at that index represent the port(s) that may be used to establish the connection path. Port queue. This queue is used to maintain the connection request order, while the requests await availability of the target port(s). Port arbiter. Once the target port(s) becomes available, the port arbiter establishes the desired connection and issues the appropriate grant signals. This block also monitors the release signals for the removal of connections. Switchbox. The switchbox forms the actual connections between modules by enabling tri-state buffers that allow the rx signals to drive the appropriate tx signals. The actual routing of the data is done through the router s switchbox. The switchbox is structured such that any given rx line can be connected to any of the available tx lines. As the work presented here targets a Xilinx Virtex-II Pro device, this switchbox was implemented with the available internal tri-state drivers. A similar mux-based implementation could be used for devices that do not provide the same tri-state capabilities. Switchboxes implemented as crossbars can be an expensive form of communication, growing with complexity N 2. An advantage of PNoC over other architectures is that its flexibility lends itself to the use of multiple smaller routers, distributed through the system rather than using a central crossbar as was used in Liu et al. [8]. The result is a smaller and faster implementation. PNoC allows the inclusion of multiple nodes with the same network address in a system. The router assumes that all nodes with address k are interchangeable and will use the first available such node to satisfy a connection request. This makes it possible to easily alter the mix of modules in processor-farm kinds of designs without the individual nodes being aware of the exact mix. This capability is exploited in the demonstration system described later in the paper Routing table updates: The network infrastructure has been designed to support dynamic module replacement via partial configuration. If a node is removed from the system during execution, using partial reconfiguration, its local router should be notified via a router update command, which will remove that module from the system s routing tables. When a new module is added to the system, an update command should be sent to its router to add it to system routing tables. Router update requests are implemented as connection requests addressed to the router itself. All routers are configured IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May

4 with an address of A router update request occurs when an updating node raises its request line while its tx_addr is set to 0 00 and its network address is held on the tx_data line. Similar to a connection process, the router raises the grant signal to indicate the table update is complete Router flexibility: An important design goal is to produce a flexible NoC. The network topology can be altered by how routers are interconnected in the system. Fig. 4 shows four different systems that involve the use of various router configurations. The interconnectivity between the modules and routers is flexible and is defined by the system designer so that the network can meet specific system needs. Each router is parameterisable in the number of ports it contains and the width of the data and address lines contained in each port connection. Many routers could be used in a single system with a custom topology that best meets the system s demands. The focus of this work is on the creation of NoC building blocks and a framework to use those building blocks rather than enumerating and comparing the possible topologies and their respective trade-offs. As shown in the figure, the mix of compute nodes to router nodes can be varied. In addition, multiple links between routers where high traffic is expected can be used to increase system performance. Finally, although all the routers shown in the figure have eight ports each, this value is parameterisable at build time. 2.2 PNoC module interface One of the goals of this work is to facilitate the design of complex systems through modular design using a simple interface to the communication medium. Modules, or nodes, that connect to the network do so via a well-defined port interface that contains multi-bit transmit and receive data address lines along with handshaking control signals (Table 1). Fig. 5 shows the hardware needed to effectively integrate a module with the network. On the left is the node circuitry itself and on the right is the network interface circuitry that consists of optional transmit and receive FIFO s and associated cts signals (not shown), and a simple FSM to communicate with the router. A CPU can be readily connected to the PNoC like any other node. A special CPU interface module decodes the CPU s memory accesses to identify and initiate accesses to the memory mapped network infrastructure. Fig. 6 shows how a CPU interfaces to the network with the Fig. 5 Node interface hardware memory-to-network address translation and a standard PNoC module interface. The implementation used in this work is a Xilinx MicroBlaze CPU combined with a custom memory mapped network interface, similar to that shown in the figure Inter-node data flow control: All routers in the system operate using a common, synchronised clock rate. Each node, however, may operate at its own clock rate. FIFOs are used between nodes and routers to provide for buffering of data as well as for crossing between the node s clock domain and the routers clock domain. Status signals from the FIFOs are provided as a part of the node connection to serve as end-to-end flow control signals. The inclusion of transmit and/or receive FIFOs in the node interface is a parameterisable feature of the node interface design and thus is optional. However, these FIFOs are strictly necessary in two cases: (1) the node runs at a clock rate different from that of its subnet router or (2) the node, when acting as a slave, is unable to keep up with the data transmission rate of potential masters. In case (1), both transmit and receive FIFOs are needed to cross between the node s clock domain and the routers clock domain. In case (2), the slower consuming slave node must use a receive FIFO. The almost full status flag on the receiving FIFO is used for flow control purposes so that data already in flight can safely reach the target node. Flow control is dictated by the use of the cts signals. When the receiving node detects almost full at its receive FIFO, its tx_cts signal is lowered and is propagated to the transmitting node s rx_cts signal. At this point, the transmitting node must stall until its rx_cts is again raised, indicating that the receiving node is capable of accepting more data Connection establishment and data transfer: To establish a connection, the requesting node (the master Node A in Fig. 2) asserts its request signal to the router while specifying the desired target node address on its Fig. 4 Four different network topologies Fig. 6 CPU interface 184 IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May 2006

5 tx_addr lines. The router determines which port is associated with the desired target node (the slave Node B in the figure) by consulting its routing tables. In this example, the connection request is forwarded on to a second router, as Node B resides in its subnet. The second router then processes the request and determines if the target node is available. Once it becomes available, the second router informs the first router, who informs the master via a grant signal, and the connection is established. The slave node is also informed of the establishment of a connection via the sl_grant signal. The master and slave are then free to transfer data as desired. Data transfers between nodes are done using a simple interface. A write-followed-by-read sequence is shown in Fig. 7. The signals in the top half of the figure are the signals seen by the master module and the signals in the lower half of the figure are the signals seen by the slave. In cycle 1, the write operation begins when the master s tx_valid ¼ 1 and tx_rnw ¼ 0. As the data transfer occurs on a dedicated connection path, there is no need for an acknowledge signal nor is there a need to specify which node the write operation is directed to. In this write transaction, there is only a single cycle of delay through the router (no FIFOs are present). Thus, the write is initiated in cycle 1 and completes at the slave node at the end of cycle 2. The initiation of a read request is shown in cycle 2 where the master s tx_valid ¼ 1 and tx_rnw ¼ 1. The slave node receives these signals on its rx_valid and rx_rnw lines in cycle 3, and responds accordingly by placing the requested data on its tx_data lines and raising its tx_valid (cycle 4). The master captures the returned data on its rx_data lines when it sees its rx_valid ¼ 1 in cycle 5. The figure also shows a second read request in cycle 3, illustrating that write and read requests can be pipelined. As long as the slave s cts (not shown) is held high, the master can initiate a new data transaction on every clock cycle. In the event, the slave cannot keep up, it can de-assert its tx_cts, thereby causing the master to temporarily suspend data transfers. This example shows the timing in the absence of FIFOs in the module interfaces. The inclusion of such FIFOs does not change the timing seen in the figure. However, the tx_cts signal is automatically de-asserted by the node interface logic in response to FIFO almost full conditions rather than being manually de-asserted by the slave node itself. As can be seen, transactions between master and slave nodes are similar to those for interfacing to pipelined memories: requests are sent and a number of cycles later the data is returned with accompanying valid signals. The router is not involved in read and write transactions except that (a) it provides the signal switching fabric so that master and slave can communicate and (b) it provides one pipeline register in the switching fabric to improve the throughput (clock rate) of the pipelined data transfers. The master can remove a connection to another block by informing the router it no longer desires the connection through assertion of the release signal. Additionally, a pend signal is supplied to the master to tell it when another node wants access to the slave node. The master may, at its discretion, prematurely close its connection in response. This behaviour is not mandated by the network, but the router functionality is provided to support it. Once the master node releases a connection, both it and the affected slave node become available for use in other connections. 3 Implementation results The PNoC building blocks described earlier have been implemented on a Xilinx Virtex-II Pro FPGA (xcv2p30-7). Design entry was done with JHDL [10]. The resulting NoC building block modules (the router, the node interface and the CPU-node interface) are parameterised as described in the previous sections. Table 2 gives the area and speed results for a variety of router instances with differing numbers of ports and differing port data widths. In each case, the routing table is implemented using a single BlockRAM, which is not reflected in the table. In addition to the router, a complete system must involve the use of node interface circuitry. This node interface hardware (containing the FIFOs from Fig. 5) requires 155 slices and two BlockRAMs. In cases where the FIFOs are not required, the area is reduced to 62 slices. The Microblaze CPU node interface circuitry, including the memory mapped network interface module, requires 196 slices and two BlockRAMs. Fig. 7 Master node write followed by read IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May

6 Q3 Table 2: Router implementation results Number of ports Data width Area (slices) Speed (MHz) Test application The utility of the PNoC is shown here using a simple image binarisation example. This algorithm uses hierarchical thresholding to quantise greyscale image pixels to binary black and white values. The computation involves computing median values at three different levels of hierarchy to be used as quantisation threshold values. The algorithm involves the following steps: 1. Compute the median value for the entire image and use that to compute the global threshold value where global_thresh ¼ median þ median/4. 2. For each block of data in the image, determine its darkest pixel value and compare that against the global threshold. If it is lower (darker) than the threshold, then that block presumably contains valid data (it is called a valid block), and is processed further in step 3. Otherwise, the entire block of data is set to a white value. 3. Each valid block is divided into smaller windows. Those windows are compared with a block threshold value and those which require additional processing are subjected to window processing in step 4. Otherwise, the window pixels are set to white. 4. Each pixel in a valid window is compared against the computed window threshold and set to black or white accordingly. 5. Steps 2 4 are repeated until every block has been processed and the complete quantised image has been produced and collected back to the CPU. This application, targeted to a Virtex-II Pro FPGA (xc2vp30-ff860-7), is illustrated in Fig. 8 and consists of four module types. The Microblaze processor is the primary control for the system and computes the global threshold value for the image and manages the distribution of the image blocks to the block modules. The universal asynchronous receiver-transmitter (UART) enables the uploading and downloading of the original and final images between the FPGA and a host computer. The Fig Binarisation top-level modules block modules compute a block level threshold value and, if valid data are detected within the block, divides the block into windows and sends these to the window modules. The window modules quantise each pixel of a window on the basis of the window s threshold value. This binarisation application was implemented both with PNoC and with two different bus-based approaches. The main design challenge in this system is in coordinating the transfer of image data between the different nodes. The major communications are between the CPU and block processors, and between the block and window processors. There are a different number of block modules and window modules because of the projected need for each kind of processing in the overall computation, however, the exact utilisation of the window modules is unpredictable, as it is completely data dependent. Because of the parallel processing and hence parallel communication involved in this system, the PNoC implementation should noticeably outperform the bus implementations because of its ability to support multiple simultaneous data transfers. 4.1 Shared-bus implementation Two different bus-based implementations were completed using Xilinx EDK version 6.3. Each contains a Microblaze processor and on-chip peripheral bus (OPB) running at 100 MHz. The first implementation uses simple reads and writes to transfer data on the bus. This has the advantage of allowing other modules bus access during the computation but results in a slower implementation. The second implementation allows the block modules to lock the bus and burst the window data transfers. This results in a faster system but prevents other modules from using the bus during those transfers (essentially during the entire computation). In a bus-based system there is no built-in way of arbitrating or scheduling access to the window modules without designing custom arbitration into the modules themselves. In these implementations, each window module was designed to satisfy requests from two statically-chosen block modules. Similarly, there is no built-in way of scheduling the bus other than relying on bus arbitration for concurrent requests. Manually time-multiplexing the bus and manually scheduling access to the window modules to improve performance without locking the bus for extended periods of time would greatly complicate the design task. Therefore the second bus implementation uses the built-in bus locking mechanism instead of a custom bus scheduling scheme. 4.2 Network implementation The PNoC is well-suited to this type of system. Multiple block-to-window module data transfers can occur simultaneously as multiple connections can be active at a given instant. Also, because of the unpredictable nature of window module utilisation, the dynamic routing capability of this network plays an important role in this system. When a block module requires the services of a window module, as all window modules are configured with a common network address, the connection can be established with whichever window module becomes available first. No additional hardware is required by the system designer to poll for available window modules, the choice of which window module to use is made by the router. Further, if no window module is available the router will queue up connection requests in order until a window module is available. This allows for considerable flexibility in the system: IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May 2006

7 additional block and window modules can be added to an existing design and recompiled for execution without any changes being required of the block and window modules. 4.3 System comparisons Table 3 compares the implementations. About 1150 slices of the network design were for the eight-port router. Both designs were downloaded to the Xilinx XUP Virtex-II Pro Development Board. Times were recorded in such a way to remove software overhead on the Microblaze from the computation time so as to compare just networking overhead. Blocks of data were first loaded into the four block modules and then the computation/communication time of the block and window modules was measured using a hardware timer. The experiments were set up to show maximum data transfer capability (all four block modules were competing for the services of the two window modules). Our original shared-bus design used individual bus reads and writes for the data transfer, resulting in a 23 performance advantage for the network version. This has the advantage of not shutting down the system bus while the computation proceeds (other transactions between CPU and other modules can compete for bus cycles and thus make progress during the computation). Our second bus design improved performance by allowing the block modules to completely lock the shared bus while they perform burst transfers. By doing so, the performance difference was reduced to 2. In this second version, however, the bus was completely locked for entire window computations (because of the streaming data nature of the window computations the bus was required to be locked during the entire computation). This prevented any other activity between the CPU and other system modules from occurring for large periods of time. To be fair, other methods could be used to make a bus-based approach workable for this problem instance. Bus bridges to isolate the CPU from much of the window module traffic could be employed. While freeing up the CPU to do other things during the window computations, this would still limit the system to physically performing one-window computation at a time, as all block and window modules would share a single bus. Attempts to further partition this shared bus into multiple buses, each servicing a subset of the block and window modules, defeats the design goal of providing a flexible platform that allows a uniform pool of window modules to serve as a processor farm for a uniform collection of block modules. In short, bus-based approaches have limited scalability compared with network-based approaches. In contrast, the PNoC version allows the CPU to communicate with other system modules (such as the UART) during the computation. It also allows multiple transfers between block modules and windows to take place concurrently, making it possible to achieve a significant fraction of the maximum available computational power present in the design. For example, the computation required the use of two window modules. At 124 MHz, each window module could conceivably maintain a 124 MB/s transfer rate with its associated block module. Each achieved, on average, 119 MB/s or 96% of the maximum bandwidth. Similarly, the utilisation of each window module was 96% over the course of the computation. The PNoC architecture would demonstrate similar performance advantages for any system that requires concurrent data transfers. The ability for the PNoC architecture to allow multiple modules to communicate simultaneously in a flexible way is its major advantage over a bus-based implementation. Additionally, the clock rate of the network implementation was 27% higher than that of the bus-based implementations. This is consistent with results we have seen for other applications we have completed, and is due to the shorter and less heavily loaded wires in the PNoC architecture compared with a shared-bus architecture. 4.4 Network architecture comparison Unfortunately, few authors have published implementation results for their proposed FPGA NoC architectures. As a result, it is difficult to perform direct comparison with other network approaches. One that has done so, however, and appears to be representative of other packet-switched approaches is presented by Bartic et al. [11]. Their system, a 2D mesh similar in topology to that shown in Fig. 1, consisted of eight network modules, and was targeted to a Xilinx Virtex-II Pro device. The communication datapaths in their system were 16-bits wide. It was assumed that each of the eight routers required a single BlockRAM for their output buffers. An equivalent PNoC architecture consists of a single eight-port router and associated compute node interfaces. It was also targeted to a Xilinx Virtex-II Pro device. Table 4 compares the two implementations in terms of area and clock rate. The simplicity of the circuit-switched PNoC architecture not only reduces the hardware costs by over 2, it also increases the clock in this example by almost 3, resulting in an area time improvement of over 5. Table 4: Comparison to packet-switched network of Bartic et al. [11] Network architecture Slices BRAMs Clock rate (MHz) Packet-switched Q4 PNoC Table 3: Binarisation system comparison Parameter Shared bus Locked bus PNoC Microblaze UART Block module Window module Communication Total slices Max clock rate, MHz Cycle count Summary and future work In this paper, we have proposed and demonstrated a flexible, lightweight circuit-switched approach to constructing FPGA-based systems. It provides the ease of design (using standard interfaces) of a bus-based approach while providing performance that approaches that of direct interconnect. We believe it flexible enough for use in general embedded systems and high-performance enough for many high-throughput data flow applications. This first experiment has quantified the implementation cost of the basic PNoC modules on an FPGA and demonstrated their utility in a real application, at the same time showing the ease of design using PNoC as well as its IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May

8 potential performance. A number of directions for continued work remain as several open questions have surfaced throughout this work. First, we want to explore the use of multiple routers and subnets in a system. Specifically, it is important to understand the kinds of topologies useful for various patterns of computing and communication, to identify the most commonly occuring patterns and to quantify the advantages of PNoC for such patterns. Further, we want to investigate the applicable solution space for circuit-switched systems such as PNoC. Our expectation is that PNoC s topological flexibility will be an important characteristic, allowing for high-delivered performance across a wider range of applications than standard crossbars and meshes as in the previously cited circuit-switch NoC work. Unfortunately, at present, there are no readily accessible packet-switched systems with which we can make meaningful comparisons. As they become available, the important work will be to perform detailed comparisons to understand the trade-offs between them and PNoC. Where are circuit-switched networks a better choice than packetswitched networks and buses? How do they compare for power consumption, clock rate, system throughput and ease of design and debug? Finally, an important goal in the creation of PNoC was to support dynamic module replacement via partial reconfiguration. As mentioned, PNoC provides support for adding and deleting modules to a running system, provided the target FPGA fabric supports some form of runtime reconfiguration. We desire to investigate the use of this capability in real applications where requirements change over time and necessitate changing the mix of modules in an embedded computing system. 6 References 1 ARM: Amba specification, Technical report, ARM, Revision 2.0, Coreconnect: Coreconnect bus architecture, Technical report, IBM Cooperation, Salminen, E., Lahtinen, V., Kuusilinna, K., and Hamalainen, T.: Overview of bus-based system-on-chip interconnections. Proc. IEEE Int. Symp. on Circuits and Systems, May 2002, vol. 2, pp. II372 II375 4 Dally, W.J., and Towles, B.: Route packets, not wires: On-chip interconnection networks. Proc. Design Automation Conf., DAC 01, June 2001, pp Kumar, S., and Jantsch, A.: A network on chip architecture and design methodology. Proc. IEEE Computer Society Annual Symp. on VLSI, ISVLSI 02, April 2002, pp Grecu, C., Pande, P.P., Ivanov, A., and Saleh, R.: A scalable communication-centric SoC interconnect architecture. Proc. 5th Int. Symp. on Quality Electronic Design, 2004, pp Marescaux, T., Bartic, A., Verkest, D., Vernalde, S., and Lauwereins, S.: Interconnection networks enable fine-grain dynamic multi-tasking on FPGAs. Proc. 12th Int. Conf. on Field-Programmable Logic and Applications, FPL 02, September 2002, pp Liu, J., Zheng, L.-R., and Tenhunen, H.: A circuit-switched network architecture for network-on-chip. Proc. Int. Symp. on System-on-Chip, September 2004, pp Wiklund, D., and Liu, D.: SoCBUS: switched network on chip for hard real time embedded systems. Proc. Int. Parallel and Distributed Processing Symp., April Hutchings, B., Bellows, P., Hawkins, J., Hemmert, S., Nelson, B., and Rytting, M.: A CAD suite for high-performance FPGA design in Pocek, K.L., and Arnold, J.M. (Eds.). Proc. IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, USA, April 1999, (IEEE Computer Society) 11 Bartic, A., Mignolet, J.Y., Nollet, V., Marescaux, T., Verkest, D., Vernalde, S., and Lauwereins, R.: Highly scalable network on chip for reconfigurable systems. Proc. Int. Symp. on System-on-Chip, November 2003, pp IEE Proc.-Comput. Digit. Tech., Vol. 153, No. 3, May 2006

Implementation of PNoC and Fault Detection on FPGA

Implementation of PNoC and Fault Detection on FPGA Implementation of PNoC and Fault Detection on FPGA Preethi T S 1, Nagaraj P 2, Siva Yellampalli 3 Department of Electronics and Communication, VTU Extension Centre, UTL Technologies Ltd. Abstract In this

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

2. System Interconnect Fabric for Memory-Mapped Interfaces

2. System Interconnect Fabric for Memory-Mapped Interfaces 2. System Interconnect Fabric for Memory-Mapped Interfaces QII54003-8.1.0 Introduction The system interconnect fabric for memory-mapped interfaces is a high-bandwidth interconnect structure for connecting

More information

ISSN Vol.03, Issue.02, March-2015, Pages:

ISSN Vol.03, Issue.02, March-2015, Pages: ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology SoC Design Lecture 11: SoC Bus Architectures Shaahin Hessabi Department of Computer Engineering Sharif University of Technology On-Chip bus topologies Shared bus: Several masters and slaves connected to

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Chapter 2 The AMBA SOC Platform

Chapter 2 The AMBA SOC Platform Chapter 2 The AMBA SOC Platform SoCs contain numerous IPs that provide varying functionalities. The interconnection of IPs is non-trivial because different SoCs may contain the same set of IPs but have

More information

Networks-on-Chip Router: Configuration and Implementation

Networks-on-Chip Router: Configuration and Implementation Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

VLSI Design of Multichannel AMBA AHB

VLSI Design of Multichannel AMBA AHB RESEARCH ARTICLE OPEN ACCESS VLSI Design of Multichannel AMBA AHB Shraddha Divekar,Archana Tiwari M-Tech, Department Of Electronics, Assistant professor, Department Of Electronics RKNEC Nagpur,RKNEC Nagpur

More information

Fast Flexible FPGA-Tuned Networks-on-Chip

Fast Flexible FPGA-Tuned Networks-on-Chip This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe

More information

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems

MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems MinRoot and CMesh: Interconnection Architectures for Network-on-Chip Systems Mohammad Ali Jabraeil Jamali, Ahmad Khademzadeh Abstract The success of an electronic system in a System-on- Chip is highly

More information

ISSN:

ISSN: 113 DESIGN OF ROUND ROBIN AND INTERLEAVING ARBITRATION ALGORITHM FOR NOC AMRUT RAJ NALLA, P.SANTHOSHKUMAR 1 M.tech (Embedded systems), 2 Assistant Professor Department of Electronics and Communication

More information

Multi MicroBlaze System for Parallel Computing

Multi MicroBlaze System for Parallel Computing Multi MicroBlaze System for Parallel Computing P.HUERTA, J.CASTILLO, J.I.MÁRTINEZ, V.LÓPEZ HW/SW Codesign Group Universidad Rey Juan Carlos 28933 Móstoles, Madrid SPAIN Abstract: - Embedded systems need

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20 University of Pannonia Dept. Of Electrical Engineering and Information Systems Hardware Design MicroBlaze v.8.10 / v.8.20 Instructor: Zsolt Vörösházi, PhD. This material exempt per Department of Commerce

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved Hardware Design MicroBlaze 7.1 This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List the MicroBlaze 7.1 Features List

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University Chapter 3 Top Level View of Computer Function and Interconnection Contents Computer Components Computer Function Interconnection Structures Bus Interconnection PCI 3-2 Program Concept Computer components

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL

Design of Synchronous NoC Router for System-on-Chip Communication and Implement in FPGA using VHDL Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.

More information

1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals.

1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals. 1. Define Peripherals. Explain I/O Bus and Interface Modules. Peripherals: Input-output device attached to the computer are also called peripherals. A typical communication link between the processor and

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Efficient And Advance Routing Logic For Network On Chip

Efficient And Advance Routing Logic For Network On Chip RESEARCH ARTICLE OPEN ACCESS Efficient And Advance Logic For Network On Chip Mr. N. Subhananthan PG Student, Electronics And Communication Engg. Madha Engineering College Kundrathur, Chennai 600 069 Email

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques

Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Design of Reconfigurable Router for NOC Applications Using Buffer Resizing Techniques Nandini Sultanpure M.Tech (VLSI Design and Embedded System), Dept of Electronics and Communication Engineering, Lingaraj

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

The CoreConnect Bus Architecture

The CoreConnect Bus Architecture The CoreConnect Bus Architecture Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formerly attached

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

ISSN Vol.03, Issue.08, October-2015, Pages:

ISSN Vol.03, Issue.08, October-2015, Pages: ISSN 2322-0929 Vol.03, Issue.08, October-2015, Pages:1284-1288 www.ijvdcs.org An Overview of Advance Microcontroller Bus Architecture Relate on AHB Bridge K. VAMSI KRISHNA 1, K.AMARENDRA PRASAD 2 1 Research

More information

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection

Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Dong Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz School of Electronics and Computer Science University of Southampton

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN

Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Network on Chip Architectures BY JAGAN MURALIDHARAN NIRAJ VASUDEVAN Multi Core Chips No more single processor systems High computational power requirements Increasing clock frequency increases power dissipation

More information

Co-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs

Co-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs Co-Design and Co-Verification using a Synchronous Language Satnam Singh Xilinx Research Labs Virtex-II PRO Device Array Size Logic Gates PPCs GBIOs BRAMs 2VP2 16 x 22 38K 0 4 12 2VP4 40 x 22 81K 1 4

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Integrated Modeling and Generation of a Reconfigurable Network-On-Chip

Integrated Modeling and Generation of a Reconfigurable Network-On-Chip Integrated Modeling and Generation of a Reconfigurable Network-On-Chip Doris Ching dorisc@ee.ucla.edu atrick Schaumont schaum@ee.ucla.edu Electrical Engineering Department, UCLA Ingrid Verbauwhede ingrid@ee.ucla.edu

More information

Dr e v prasad Dt

Dr e v prasad Dt Dr e v prasad Dt. 12.10.17 Contents Characteristics of Multiprocessors Interconnection Structures Inter Processor Arbitration Inter Processor communication and synchronization Cache Coherence Introduction

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs* Eui Bong Jung 1, Han Wook Cho 1, Neungsoo Park 2, and Yong Ho Song 1 1 College of Information and Communications, Hanyang University,

More information

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Low-Power Capacity- based Measurement Application on Xilinx FPGAs Abstract The application of Field Programmable

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Accessing I/O Devices Interface to CPU and Memory Interface to one or more peripherals Generic Model of IO Module Interface for an IO Device: CPU checks I/O module device status I/O module returns status

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Integrated modelling and generation of a reconfigurable network-on-chip. Doris Ching* and Patrick Schaumont. Ingrid Verbauwhede

Integrated modelling and generation of a reconfigurable network-on-chip. Doris Ching* and Patrick Schaumont. Ingrid Verbauwhede 218 Int. J. Embedded Systems, Vol. 1, Nos. 3/4, 2005 Integrated modelling and generation of a reconfigurable network-on-chip Doris Ching* and Patrick Schaumont University of California at Los Angeles,

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

Memory centric thread synchronization on platform FPGAs

Memory centric thread synchronization on platform FPGAs Memory centric thread synchronization on platform FPGAs Chidamber Kulkarni Xilinx Inc San Jose, Ca Chidamber.Kulkarni@xilinx.com Gordon Brebner Xilinx Inc San Jose, Ca Gordon.Brebner@xilinx.com Abstract

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

ISSN Vol.05,Issue.09, September-2017, Pages:

ISSN Vol.05,Issue.09, September-2017, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

Architecture of An AHB Compliant SDRAM Memory Controller

Architecture of An AHB Compliant SDRAM Memory Controller Architecture of An AHB Compliant SDRAM Memory Controller S. Lakshma Reddy Metch student, Department of Electronics and Communication Engineering CVSR College of Engineering, Hyderabad, Andhra Pradesh,

More information

Design & Implementation of AHB Interface for SOC Application

Design & Implementation of AHB Interface for SOC Application Design & Implementation of AHB Interface for SOC Application Sangeeta Mangal M. Tech. Scholar Department of Electronics & Communication Pacific University, Udaipur (India) enggsangeetajain@gmail.com Nakul

More information

High-Level Simulations of On-Chip Networks

High-Level Simulations of On-Chip Networks High-Level Simulations of On-Chip Networks Claas Cornelius, Frank Sill, Dirk Timmermann 9th EUROMICRO Conference on Digital System Design (DSD) - Architectures, Methods and Tools - University of Rostock

More information

Lecture 25: Busses. A Typical Computer Organization

Lecture 25: Busses. A Typical Computer Organization S 09 L25-1 18-447 Lecture 25: Busses James C. Hoe Dept of ECE, CMU April 27, 2009 Announcements: Project 4 due this week (no late check off) HW 4 due today Handouts: Practice Final Solutions A Typical

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

WITH the development of the semiconductor technology,

WITH the development of the semiconductor technology, Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)

More information

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE N.G.N.PRASAD Assistant Professor K.I.E.T College, Korangi Abstract: The AMBA AHB is for high-performance, high clock frequency

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

Embedded Systems 1: On Chip Bus

Embedded Systems 1: On Chip Bus October 2016 Embedded Systems 1: On Chip Bus Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.deib.polimi.it/zoni Additional Material and Reference Book 2 Reference Book Chapter Principles and

More information

With Fixed Point or Floating Point Processors!!

With Fixed Point or Floating Point Processors!! Product Information Sheet High Throughput Digital Signal Processor OVERVIEW With Fixed Point or Floating Point Processors!! Performance Up to 14.4 GIPS or 7.7 GFLOPS Peak Processing Power Continuous Input

More information

Embedded Programmable Logic Core Enhancements for System Bus Interfaces

Embedded Programmable Logic Core Enhancements for System Bus Interfaces Embedded Programmable Logic Core Enhancements for System Bus Interfaces Bradley R. Quinton, Steven J.E. Wilton Dept. of Electrical and Computer Engineering University of British Columbia {bradq,stevew}@ece.ubc.ca

More information

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G MAHESH BABU, et al, Volume 2, Issue 7, PP:, SEPTEMBER 2014. DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER G.Mahesh Babu 1*, Prof. Ch.Srinivasa Kumar 2* 1. II. M.Tech (VLSI), Dept of ECE,

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures

Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Towards Performance Modeling of 3D Memory Integrated FPGA Architectures Shreyas G. Singapura, Anand Panangadan and Viktor K. Prasanna University of Southern California, Los Angeles CA 90089, USA, {singapur,

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

DESIGN AND VERIFICATION ANALYSIS OF APB3 PROTOCOL WITH COVERAGE

DESIGN AND VERIFICATION ANALYSIS OF APB3 PROTOCOL WITH COVERAGE DESIGN AND VERIFICATION ANALYSIS OF APB3 PROTOCOL WITH COVERAGE Akhilesh Kumar and Richa Sinha Department of E&C Engineering, NIT Jamshedpur, Jharkhand, India ABSTRACT Today in the era of modern technology

More information

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE

OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE OVERVIEW: NETWORK ON CHIP 3D ARCHITECTURE 1 SOMASHEKHAR, 2 REKHA S 1 M. Tech Student (VLSI Design & Embedded System), Department of Electronics & Communication Engineering, AIET, Gulbarga, Karnataka, INDIA

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

INPUT-OUTPUT ORGANIZATION

INPUT-OUTPUT ORGANIZATION INPUT-OUTPUT ORGANIZATION Peripheral Devices: The Input / output organization of computer depends upon the size of computer and the peripherals connected to it. The I/O Subsystem of the computer, provides

More information