Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck

Size: px
Start display at page:

Download "Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck"

Transcription

1 Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck Volker Lindenstruth; The continued increase in Internet throughput and the emergence of broadband access networks drive the development of communication processors. Other developing arenas for the application of intelligent I/O are storage area networks and system area networks used to cluster computers and mass storage systems, respectively. However, given all advantages of such devices, they have a common memory bottleneck originating from the internal bus that connects the I/O ports with the internal processor's core. This paper presents a novel intelligent I/O architecture that eliminates this bottleneck by implementing a novel data transfer controller that grants a four-fold improvement over conventional designs. I. INTRODUCTION The amount of processing that is related to any kind of input/output is increasing as is the throughput demand. In addition, latency and processor overhead become increasingly important factors with respect to the performance of input/output systems. One consequence is the intelligent I/O paradigm [4]. Communication processors are being used in various applications ranging from intelligent port cards for scalable multi-protocol routers [2] to intelligent network interfaces that use local intelligence to relieve the host processor of low-level network transactions and reduce the interrupt rate in order to gain scalability of parallel computers and clusters. Other applications include intelligent network ports performing digital video compression for the particular network link and data encryption/decryption functionality in the network interface allowing the transparent use of public networks for protected data transfers. Various processor architectures are being studied or have been implemented [1, 3]. The data paths in a communication processor are typically connected by an internal bus. The advantage is that all devices connected to this bus can simply exchange mutual data without requiring additional hardware. The disadvantage is that buses are broad cast type devices. Any data sent is visible by all devices and entirely blocks the bus. In the case of unified transaction buses and read transactions, any response latency results in stalling the bus. Pipeline arbitration and split transaction buses help to optimally utilize any bus, but the principle limitation of it supporting only one master at any point in time remains. As one example, Figure 1 sketches the architecture of the Intel 960Rx Intelligent I/O Processor as published in [3]. It implements two ports, which allow direct, mutual data exchange using the internal PCI bridge, but also direct data access by the internal microcontroller to any of the two ports. If, however, data is required to be handled by the processor, it cannot bypass the PCI bridge but has to be received by the internal processor, requiring the use of the internal data bus. Incoming Data Outgoing Data Instruction Fetches Bus MIU DMA Interface 2 Processor Internal data bus PCI-PCI Bridge Local Bus BIU DMA Interface 1 Figure 1: The Intel IOP architecture Consider one typical scenario: an Ethernet packet being received that is to be inspected by the processor in order to determine whether it is a data or control packet and where to route it inside the host computer based on the packet header and routing information stored within the IOP. The packet is arriving, for example, at port 1 and is routed through DMA interface 1 to the internal data bus. Since the internal memory is typically small, the data may have to be sent to the internal back-end bus using the memory bus interface unit (MIU). Now the internal processor needs to inspect the data also utilizing the internal data bus and MIU. Note that during the data transmission, the processor is unlikely to be able to perform any accesses to the internal bus, thus being totally blocked. Any access to the packet is to be done that way. Therefore, any data word in the packet that has to be modified, for example implementing a data encryption/decryption algorithm, needs to be read and written involving two accesses to the internal data bus. Any cache miss resulting in additional instruction and data fetches, which are not addressing the internal memory, also have to utilize the internal bus. Then, if the packet is to be forwarded to the second port, it again utilizes the internal bus in order to be transmitted to port 2 through the DMA interface 2. In this scenario, at least 4 accesses per data word and packet plus any instruction and data fetches which are cache misses of the microprocessor are required. So in order to achieve highest throughput, the internal data bus would have to run at more than 4x speed. Still the example shows that the given scenario would basically implement a store/forward scenario rather than a cut through paradigm. The situation could be improved slightly by adding dual-

2 ported memories or FiFos to the DMA interface, but the internal data bus bottleneck remains. This is a fundamental design problem of all bus-based architectures. It is common that with any network interface or router port card a cut through elasticity buffers is required at the input and output ports allowing to decouple the various data streams with respect to access latencies, such as arbitration delays, different clock domains, and the like. Typically, such buffers are implemented as FiFos. Fifos, on the other hand, are typically based internally on a dual-ported memory with independent read and write pointers at the appropriate ports. Select2 = 1 If address translation and access control tables are also put into the triple-port memory, this routing functionality can even be implemented by simple state machines with very little overhead. This functionality grants PCI bridge functionality as indicated in the figure (pass-through-logic). In that case, the triple-port memory functions as an elasticity buffer for both input and output ports without requiring the movement of data at all, resulting in minimal latency and power consumption. Which packets are to be forwarded directly and which packets require processor intervention can be determined easily by the DMA interfaces as they can snoop the packet headers at their appropriate input ports. Bus MIU Select1 = 1 Processor Local Bus BIU Processor Bus Q1 Q3 Q6. Q5... DMA Interface 2 Q4 Q2.. Triple Port DMA Interface 1 Pass-Through-Logic Q8 bit2 bit1 bit1 bit2 Q7 Figure 2: multi port memory architecture Figure 2 shows the principle operation of a dual-port memory. Q1 through Q4 form the internal data storage cell storing the bit while Q5,Q6 and Q7,Q8 are used to readout the bits at the two ports, respectively. It becomes obvious that it is possible to add further independent readout paths without affecting the existing data path. II. ARCHITECTURE As should be obvious now, combining an at least three-port memory with a processor and two external ports grants a new kind of performing IOPs as sketched in Figure 3 below. Using the same scenario as above, the packet is received at port 1 and flows directly into the triple-port memory. At any point in time, the internal processor can access any already received portion of the data using its processor bus and private port to the triple-port memory without affecting the data stream at all. Any data conversion can happen basically one clock cycle after the word was received. As soon as the processor is done processing, the packet can be forwarded to the second port. If that port runs at the same speed, no further handshake is required and the transmission can start immediately even if the packet has not completely been received, thus implementing cut through routing. Figure 3: the IOP architecture III. IMPLEMENTATION To prove the presented concept and prove its operability, a prototype was built based on commercial off-the-shelf components. Obviously, in the long run, the majority of the logic can be integrated, reducing cost. The prototype implements a PCI-SCI [5] system area network adapter. However, as should be obvious, the architecture is neither limited to the system area network environment nor to the use of the particular network transport. Rather, it allows very flexible adoption between various bus and network standards. Figure 4: First implementation of the IOP The network interface implements protocol conversion between the unified bus PCI and the split transaction

3 network SCI. Incoming SCI requests are address translated and executed as PCI master, provided proper access rights are configured in the internal access control tables. PCI requests are translated transparently into split transactions. In case of PCI write bursts, the data is posted within the NIC and appropriate network write requests are produced. In case of errors, which would be out of sequence, they have to be dispatched by an appropriate error handler. For debugging, writes can be made synchronous. PCI read transactions operate against a local cache. In case of a cache hit, the data resides inside the multi-port memory and is returned immediately. In case of a read miss, the host is stalled by issuing PCI retry cycles and appropriate read request packets are generated at the network port. As soon as the successful read response is being received, the internal cache tags are updated and the stalled read transaction is completed. completely asynchronously and independent of any of the other bus transactions. IV. NETWORK TO PCI TRANSACTIONS FLYBY ADDRESS TRANSLATION Any incoming network packet needs to be routed according to it being data or control. In the case of data packets, address translation and access control functionality is to be performed in order to allow copying of the data directly to the destination buffer in the host's memory, thus implementing zero copy network communication. Figure 6 below sketches the translation scheme here assuming a 4 kbyte page size. Some bits from both the source network ID and target address fields are used to identify the entry in the address translation table. Here, nine bits were chosen without restricting the general case. SCI Source ID =0 SCI Address SCI->PCI Lut CTL PCI Address Figure 5: SCI PCI DMA transacion while microcontroller accesses elasticity buffer Figure 5 above shows a board level simulation of a burst of incoming SCI 64-byte packets being stored in the multi-port ram. The data is immediately (here 12 clocks delay) forwarded to the PCI front-end bus. This latency includes all address translation, access control and buffer management inside the device, which is carried out on the fly. The transmission of one 64-byte packet takes 20 clocks, of which 16 clocks are the data itself (32-bit bus), plus one address phase and three clocks for the fly-by address translation. As is shown in the following paragraph, these extra clocks are not principally required and are only an implementation artifact required due to the particular architecture of the multi-port memories chosen. Should the device be integrated into an ASIC, these clocks can be saved, however, assuming 33 MHz PCI 20 clocks per 64- byte packet corresponds to 107 MB/sec aggregate throughput. In order to demonstrate the independence of the various memory ports, the PCI and SCI transactions happen simultaneously. In addition the microcontroller performs additional read/write transactions to the multi-port memory core as shown in figure 5. All these transactions happen Figure 6: address translation scheme An address translation requires the new address from the ATE to be read and typically a write transaction to generate the updated address. Such transactions typically cause bus cycles and thus cause memory bandwidth and latency as the packet cannot be forwarded prior to the completion of the address translation to complete. SCI SourceID Address Translation Index buffer 1 buffer 2 buffer 3... buffer N Address Translation Table address domain B page size SCI Subaddress address domain A Figure 7: Triple-Port layout for FlyBy Address Translation Figure 7 shows the triple-port memory layout for the FlyBy address translation. The incoming data address in the

4 transport address space is snooped together with the sender's network source ID by the DMA controller handling the incoming packet while it is storing this incoming packet in one available buffer (her buffer 2). Since this information is part of the packet header, it is available before the first data word is received. The address translation table index is snooped from the packet header and derived according to the scheme depicted in Figure 6. Then it is forwarded immediately to the PCI state machine, which immediately starts requesting the PCI bus. The first word to be transmitted is the target address, which is composed of the lowest 12 bits of the target subaddress directly taken from the packet header (4 kb pages) and the translated address taken from the address translation table. The PCI port of the triple-port memory is implemented as two separate memories with independent data and address buses with LSW being 12 bits wide. All other ports of the TPM have their address buses connected. However, for the PCI bus port, it is now possible to select different regions of the MPM for the lower and upper data bus word and thus to compile the translated address without a single memory reference. The Buffer slot number and address translation index are known a priori. The address of the least significant data bus word is driven such that the target address of the packet header is selected. The address of the most significant word is driven to select the appropriate index of the ATE stored in the TPM. Therefore the correct target address is assembled in FlyBy without any additional memory references. As indicated in Figure 7 there are additional bits provided allowing to define access controlled and write protected memory regions. Upon completion of the PCI burst, an appropriate network acknowledgement is generated in order to complete the split transaction protocol with the peer requestor. In order to allow a larger subaddress region than defined by a 512 entry times 4kByte page window, the address translation table is implemented as address translation cache with appropriate ATT tags. V. PCI TO NETWORK TRANSACTIONS PCI to network transactions are more complex as they are unified transactions and need to be broken up into request/response transactions. In case of a PCI write burst, the target address is stored together with the appropriate data in an available buffer in the triple-port memory. After the target address is received (after the first clock), the outbound address translation and access control is performed and an appropriate network header is generated. Once the data is completely received, the write packet is queued for shipment to the network and the PCI host is signaled posted completion. For debugging, this scheme can be made synchronous, however, at the corresponding performance loss. Read transactions are more complex as they require remote data to be available before the transaction can be completed. The cycle starts similar to a write transaction. The PCI target address is stored in an available buffer slot. The address is also snooped by the PCI state machine, which uses address bits 6-11 as index into a directly mapped data cache. The cache tag is being read from the same port of the triple-port memory. This is possible since, after one bus turnaround cycle, the initiator now tries to read, thus keeping its data outputs high impedance. If this is the first request of its kind, the cache tag will be invalid (V-bit clear or invalid Address Tag) requiring a network read request, which is generated similar to the PCI write transaction. At this point the PCI requestor stalls. 2 Tag (RT) VR PCIAdr AdrTag BufId STag S 0 TagSel XX V: (valid) Data/BufID valid R: (request sent) AdrTag valid, Data/BufID invalid S: This is a single transaction cache tag (include Tag) X: ignore Figure 8: transparent PCI to network transaction cache tag At some point in time, a read response is received and stored anywhere in the input buffer space. Upon arrival of the read response packet, the cache tag is updated accordingly by writing the correct Address Tag and BufferID. This can be easily done while the requestor is locking the PCI port of the triple-port memory by using, for example, the microcontroller port of the memory. As soon as the requestor sees the valid read cache tag, it matches the Address Tag and uses the BufID entry to calculate the correct address for the read data and completes the transaction. Any further requests to this memory block (here 64 bytes) will result in an immediate cache hit, thus being fast. There is a large variety of caching and prefetching strategies conceivable. Given the high cost of triple-port memories, an additional backing store is provided that allows to implement a larger second level cache on the network interface. Data is moved between the triple-port memory and the backing store using FlyBy DMA. All accesses to the device are intercepted by the packet state machine described above. This functionality can also be used for accesses to the local CSR space of the device, which is not just a memory mapped region of hardware status bits. The devices control and status region is a specific memory region, which is treated like any other transaction by the hardware. However the firmware will interpret the CSR read/write commands and execute them accordingly. In order to reduce latency, a mirror of the internal CSR status can be produced at a defined location in the host memory. Given this architecture, it is possible to implement any memory map or CSR layout without changing any hardware. Basically any software interface

5 can be accommodated by adopting the firmware accordingly. VI. OUTLOOK As should be obvious, given the high flexibility of the device, it can implement many input/output architectures including I2O and VIA. Further its applicability is not limited also to the particular choice of SCI as network transport. To demonstrate the flexibility of the architecture, a potential application to another network architecture, such as InfiniBand, shall be outlined here. The baseline InfiniBand Architecture is modular and flexible implementing separate functions as Host Channel Adapter (HCA) and IB-switch. However, any host will require both functions to be present. This results in unnecessary additional latency, overhead and required silicon real estate. address translation can be performed here also. The IOP can be tightly integrated into the HCA architecture and thus can very effectively access any part of the data stream without affecting any of the other ports. In fact, any data access by the IOP is completely asynchronous and independent of any of the other ports. VII. SUMMARY A novel intelligent network interface architecture is presented as concept that avoids the throughput bottleneck of conventional IOP architectures. The first prototype implementation, a symmetric PCI-SCI bridge, demonstrates various advantages of this architecture such as combined input/output buffers, zero copying of any data, FlyBy address translations. The architecture also supports effective bridging between radically different network or bus standards. All I/O ports implement queuing functionality, that supports multiple outstanding transactions. For further reading refer to [6]. CPU CPU Mem Mem Ctl. Figure 9: transparent PCI to network transaction cache tag Figure 9 above sketches an appropriate application of the discussed multi-ported IOP to the InfiniBand architecture. Here, the necessary switch is combined with the host channel adapter, using the multi-port memory. The device is further supplemented by the intelligent I/O processor. IOP Switch Control MultiPort VIII. BIBLIOGRAPHY [1] Architectural Considerations for CPU and Network Interface Integration, Hot Interconnects 1999 [2] Router Architectures and the True Data Transport Infrastructure, Hot Interconnects 1999 [3] i960rd, Garbus et al, US Patent 5,734,847 [4] I2Osig, [5] Scalable Coherent Interface, IEEE [6] Method and apparatus for enabling high performance intelligent I/O using multi port memories, US Patent 7,042,961 The multi-port memory can supply one port to each IB- Link, thus allowing completely independent and asynchronous data exchange between any ports. Data being received can be forwarded to any port without the requirement to move any bit of the message. Cut-through routing is a simple consequence. The available elasticity buffer space can be dynamically assigned to any channel. Any port of the switch implemented here can be operated at any speed without affecting any other port, including the interface to the IOP and the host memory controller. Any data bit being routed through the HCA can be accessed by the IOP without affecting any other part of the data stream. Packets can be reformatted on the fly while others are being received and/or forwarded through third ports of the device. This architecture merges the switches input and output buffers into the same memory bits, while avoiding the requirement for internal buses cross-bars or multiplexers. By placing address translation/access control tables within the multi-port memory appropriately, the discussed fly-by

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved. + William Stallings Computer Organization and Architecture 10 th Edition 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. 2 + Chapter 3 A Top-Level View of Computer Function and Interconnection

More information

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS

TECHNOLOGY BRIEF. Compaq 8-Way Multiprocessing Architecture EXECUTIVE OVERVIEW CONTENTS TECHNOLOGY BRIEF March 1999 Compaq Computer Corporation ISSD Technology Communications CONTENTS Executive Overview1 Notice2 Introduction 3 8-Way Architecture Overview 3 Processor and I/O Bus Design 4 Processor

More information

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week)

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) + (Advanced) Computer Organization & Architechture Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) + Outline 2. The computer system 2.1 A Top-Level View of Computer Function and Interconnection 2.2 Cache Memory

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Introduction to Input and Output

Introduction to Input and Output Introduction to Input and Output The I/O subsystem provides the mechanism for communication between the CPU and the outside world (I/O devices). Design factors: I/O device characteristics (input, output,

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Modes of Transfer. Interface. Data Register. Status Register. F= Flag Bit. Fig. (1) Data transfer from I/O to CPU

Modes of Transfer. Interface. Data Register. Status Register. F= Flag Bit. Fig. (1) Data transfer from I/O to CPU Modes of Transfer Data transfer to and from peripherals may be handled in one of three possible modes: A. Programmed I/O B. Interrupt-initiated I/O C. Direct memory access (DMA) A) Programmed I/O Programmed

More information

6 Direct Memory Access (DMA)

6 Direct Memory Access (DMA) 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 6 Direct Access (DMA) DMA technique is used to transfer large volumes of data between I/O interfaces and the memory. Example: Disk drive controllers,

More information

CS330: Operating System and Lab. (Spring 2006) I/O Systems

CS330: Operating System and Lab. (Spring 2006) I/O Systems CS330: Operating System and Lab. (Spring 2006) I/O Systems Today s Topics Block device vs. Character device Direct I/O vs. Memory-mapped I/O Polling vs. Interrupts Programmed I/O vs. DMA Blocking vs. Non-blocking

More information

Generic Model of I/O Module Interface to CPU and Memory Interface to one or more peripherals

Generic Model of I/O Module Interface to CPU and Memory Interface to one or more peripherals William Stallings Computer Organization and Architecture 7 th Edition Chapter 7 Input/Output Input/Output Problems Wide variety of peripherals Delivering different amounts of data At different speeds In

More information

The CoreConnect Bus Architecture

The CoreConnect Bus Architecture The CoreConnect Bus Architecture Recent advances in silicon densities now allow for the integration of numerous functions onto a single silicon chip. With this increased density, peripherals formerly attached

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University l Chapter 10: File System l Chapter 11: Implementing File-Systems l Chapter 12: Mass-Storage

More information

Performance Tuning on the Blackfin Processor

Performance Tuning on the Blackfin Processor 1 Performance Tuning on the Blackfin Processor Outline Introduction Building a Framework Memory Considerations Benchmarks Managing Shared Resources Interrupt Management An Example Summary 2 Introduction

More information

Cisco Series Internet Router Architecture: Packet Switching

Cisco Series Internet Router Architecture: Packet Switching Cisco 12000 Series Internet Router Architecture: Packet Switching Document ID: 47320 Contents Introduction Prerequisites Requirements Components Used Conventions Background Information Packet Switching:

More information

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved Hardware Design MicroBlaze 7.1 This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List the MicroBlaze 7.1 Features List

More information

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University EE108B Lecture 17 I/O Buses and Interfacing to CPU Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Remaining deliverables PA2.2. today HW4 on 3/13 Lab4 on 3/19

More information

Computer-System Organization (cont.)

Computer-System Organization (cont.) Computer-System Organization (cont.) Interrupt time line for a single process doing output. Interrupts are an important part of a computer architecture. Each computer design has its own interrupt mechanism,

More information

Characteristics of Mult l ip i ro r ce c ssors r

Characteristics of Mult l ip i ro r ce c ssors r Characteristics of Multiprocessors A multiprocessor system is an interconnection of two or more CPUs with memory and input output equipment. The term processor in multiprocessor can mean either a central

More information

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture. PCI Express System Interconnect Software Architecture Application Note AN-531 Introduction By Kwok Kong A multi-peer system using a standard-based PCI Express (PCIe ) multi-port switch as the system interconnect

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub. es > 100 MB/sec Pentium 4 Processor L1 and L2 caches Some slides adapted from lecture by David Culler 3.2 GB/sec Display Memory Controller Hub RDRAM RDRAM Dual Ultra ATA/100 24 Mbit/sec Disks LAN I/O Controller

More information

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security Input/Output Today Principles of I/O hardware & software I/O software layers Disks Next Protection & Security Operating Systems and I/O Two key operating system goals Control I/O devices Provide a simple,

More information

Lecture 25: Busses. A Typical Computer Organization

Lecture 25: Busses. A Typical Computer Organization S 09 L25-1 18-447 Lecture 25: Busses James C. Hoe Dept of ECE, CMU April 27, 2009 Announcements: Project 4 due this week (no late check off) HW 4 due today Handouts: Practice Final Solutions A Typical

More information

Chapter 8. A Typical collection of I/O devices. Interrupts. Processor. Cache. Memory I/O bus. I/O controller I/O I/O. Main memory.

Chapter 8. A Typical collection of I/O devices. Interrupts. Processor. Cache. Memory I/O bus. I/O controller I/O I/O. Main memory. Chapter 8 1 A Typical collection of I/O devices Interrupts Cache I/O bus Main memory I/O controller I/O controller I/O controller Disk Disk Graphics output Network 2 1 Interfacing s and Peripherals I/O

More information

2. System Interconnect Fabric for Memory-Mapped Interfaces

2. System Interconnect Fabric for Memory-Mapped Interfaces 2. System Interconnect Fabric for Memory-Mapped Interfaces QII54003-8.1.0 Introduction The system interconnect fabric for memory-mapped interfaces is a high-bandwidth interconnect structure for connecting

More information

TMS320C64x EDMA Architecture

TMS320C64x EDMA Architecture Application Report SPRA994 March 2004 TMS320C64x EDMA Architecture Jeffrey Ward Jamon Bowen TMS320C6000 Architecture ABSTRACT The enhanced DMA (EDMA) controller of the TMS320C64x device is a highly efficient

More information

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010 SEMICON Solutions Bus Structure Created by: Duong Dang Date: 20 th Oct,2010 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single

More information

Buses. Maurizio Palesi. Maurizio Palesi 1

Buses. Maurizio Palesi. Maurizio Palesi 1 Buses Maurizio Palesi Maurizio Palesi 1 Introduction Buses are the simplest and most widely used interconnection networks A number of modules is connected via a single shared channel Microcontroller Microcontroller

More information

Chapter Seven Morgan Kaufmann Publishers

Chapter Seven Morgan Kaufmann Publishers Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be

More information

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the

More information

CSE398: Network Systems Design

CSE398: Network Systems Design CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University February 7, 2005 Outline

More information

Universal Serial Bus Host Interface on an FPGA

Universal Serial Bus Host Interface on an FPGA Universal Serial Bus Host Interface on an FPGA Application Note For many years, designers have yearned for a general-purpose, high-performance serial communication protocol. The RS-232 and its derivatives

More information

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?

Parallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors? Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing

More information

Interconnection Structures. Patrick Happ Raul Queiroz Feitosa

Interconnection Structures. Patrick Happ Raul Queiroz Feitosa Interconnection Structures Patrick Happ Raul Queiroz Feitosa Objective To present key issues that affect interconnection design. Interconnection Structures 2 Outline Introduction Computer Busses Bus Types

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Architecture Specification

Architecture Specification PCI-to-PCI Bridge Architecture Specification, Revision 1.2 June 9, 2003 PCI-to-PCI Bridge Architecture Specification Revision 1.1 December 18, 1998 Revision History REVISION ISSUE DATE COMMENTS 1.0 04/05/94

More information

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0.

5 MEMORY. Overview. Figure 5-0. Table 5-0. Listing 5-0. 5 MEMORY Figure 5-0. Table 5-0. Listing 5-0. Overview The ADSP-2191 contains a large internal memory and provides access to external memory through the DSP s external port. This chapter describes the internal

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Module 6: INPUT - OUTPUT (I/O)

Module 6: INPUT - OUTPUT (I/O) Module 6: INPUT - OUTPUT (I/O) Introduction Computers communicate with the outside world via I/O devices Input devices supply computers with data to operate on E.g: Keyboard, Mouse, Voice recognition hardware,

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

The RM9150 and the Fast Device Bus High Speed Interconnect

The RM9150 and the Fast Device Bus High Speed Interconnect The RM9150 and the Fast Device High Speed Interconnect John R. Kinsel Principal Engineer www.pmc -sierra.com 1 August 2004 Agenda CPU-based SOC Design Challenges Fast Device (FDB) Overview Generic Device

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

Storage Systems. Storage Systems

Storage Systems. Storage Systems Storage Systems Storage Systems We already know about four levels of storage: Registers Cache Memory Disk But we've been a little vague on how these devices are interconnected In this unit, we study Input/output

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

INPUT/OUTPUT ORGANIZATION

INPUT/OUTPUT ORGANIZATION INPUT/OUTPUT ORGANIZATION Accessing I/O Devices I/O interface Input/output mechanism Memory-mapped I/O Programmed I/O Interrupts Direct Memory Access Buses Synchronous Bus Asynchronous Bus I/O in CO and

More information

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola 1. Microprocessor Architectures 1.1 Intel 1.2 Motorola 1.1 Intel The Early Intel Microprocessors The first microprocessor to appear in the market was the Intel 4004, a 4-bit data bus device. This device

More information

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology SoC Design Lecture 11: SoC Bus Architectures Shaahin Hessabi Department of Computer Engineering Sharif University of Technology On-Chip bus topologies Shared bus: Several masters and slaves connected to

More information

Systems Architecture II

Systems Architecture II Systems Architecture II Topics Interfacing I/O Devices to Memory, Processor, and Operating System * Memory-mapped IO and Interrupts in SPIM** *This lecture was derived from material in the text (Chapter

More information

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad

Architectural Differences nc. DRAM devices are accessed with a multiplexed address scheme. Each unit of data is accessed by first selecting its row ad nc. Application Note AN1801 Rev. 0.2, 11/2003 Performance Differences between MPC8240 and the Tsi106 Host Bridge Top Changwatchai Roy Jenevein risc10@email.sps.mot.com CPD Applications This paper discusses

More information

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Chapter 6. I/O issues

Chapter 6. I/O issues Computer Architectures Chapter 6 I/O issues Tien-Fu Chen National Chung Cheng Univ Chap6 - Input / Output Issues I/O organization issue- CPU-memory bus, I/O bus width A/D multiplex Split transaction Synchronous

More information

Computer Architecture CS 355 Busses & I/O System

Computer Architecture CS 355 Busses & I/O System Computer Architecture CS 355 Busses & I/O System Text: Computer Organization & Design, Patterson & Hennessy Chapter 6.5-6.6 Objectives: During this class the student shall learn to: Describe the two basic

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: 1. [6 points] Give a concise answer to each of the following

More information

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

More information

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec

Introduction I/O 1. I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec Introduction I/O 1 I/O devices can be characterized by Behavior: input, output, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections I/O Device Summary I/O 2 I/O System

More information

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems

Reading and References. Input / Output. Why Input and Output? A typical organization. CSE 410, Spring 2004 Computer Systems Reading and References Input / Output Reading» Section 8.1-8.5, Computer Organization and Design, Patterson and Hennessy CSE 410, Spring 2004 Computer Systems http://www.cs.washington.edu/education/courses/410/04sp/

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:

More information

Input/Output Problems. External Devices. Input/Output Module. I/O Steps. I/O Module Function Computer Architecture

Input/Output Problems. External Devices. Input/Output Module. I/O Steps. I/O Module Function Computer Architecture 168 420 Computer Architecture Chapter 6 Input/Output Input/Output Problems Wide variety of peripherals Delivering different amounts of data At different speeds In different formats All slower than CPU

More information

Lecture notes for CS Chapter 2, part 1 10/23/18

Lecture notes for CS Chapter 2, part 1 10/23/18 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

PCI and PCI Express Bus Architecture

PCI and PCI Express Bus Architecture PCI and PCI Express Bus Architecture Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee yhlee@asu.edu (480) 727-7507 7/23 Buses in PC-XT and PC-AT ISA

More information

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University Chapter 3 Top Level View of Computer Function and Interconnection Contents Computer Components Computer Function Interconnection Structures Bus Interconnection PCI 3-2 Program Concept Computer components

More information

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems 1 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ 10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems To enhance system performance and, in some cases, to increase

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

UNIT I (Two Marks Questions & Answers)

UNIT I (Two Marks Questions & Answers) UNIT I (Two Marks Questions & Answers) Discuss the different ways how instruction set architecture can be classified? Stack Architecture,Accumulator Architecture, Register-Memory Architecture,Register-

More information

Module 2 Storage Network Architecture

Module 2 Storage Network Architecture Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,

More information

CPCI-HPDI32ALT High-speed 64 Bit Parallel Digital I/O PCI Board 100 to 400 Mbytes/s Cable I/O with PCI-DMA engine

CPCI-HPDI32ALT High-speed 64 Bit Parallel Digital I/O PCI Board 100 to 400 Mbytes/s Cable I/O with PCI-DMA engine CPCI-HPDI32ALT High-speed 64 Bit Parallel Digital I/O PCI Board 100 to 400 Mbytes/s Cable I/O with PCI-DMA engine Features Include: 200 Mbytes per second (max) input transfer rate via the front panel connector

More information

IRIG-106 PCM IMPLEMENTATION UTILIZING CONSULTATIVE COMMITTEE FOR SPACE DATA SYSTEMS (CCSDS)

IRIG-106 PCM IMPLEMENTATION UTILIZING CONSULTATIVE COMMITTEE FOR SPACE DATA SYSTEMS (CCSDS) IRIG-106 PCM IMPLEMENTATION UTILIZING CONSULTATIVE COMMITTEE FOR SPACE DATA SYSTEMS (CCSDS) by Casey Tubbs SCI Technology, Inc. 8600 South Memorial Pkwy Huntsville, Alabama 35802 (205) 882-4267 ABSTRACT

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Chapter 6 Storage and Other I/O Topics

Chapter 6 Storage and Other I/O Topics Department of Electr rical Eng ineering, Chapter 6 Storage and Other I/O Topics 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Feng-Chia Unive ersity Outline 6.1 Introduction 6.2 Dependability,

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics Chapter 4 Objectives Learn the components common to every modern computer system. Chapter 4 MARIE: An Introduction to a Simple Computer Be able to explain how each component contributes to program execution.

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

SunFire range of servers

SunFire range of servers TAKE IT TO THE NTH Frederic Vecoven Sun Microsystems SunFire range of servers System Components Fireplane Shared Interconnect Operating Environment Ultra SPARC & compilers Applications & Middleware Clustering

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification

RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification RapidIO TM Interconnect Specification Part 7: System and Device Inter-operability Specification Rev. 1.3, 06/2005 Copyright RapidIO Trade Association RapidIO Trade Association Revision History Revision

More information

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and

More information

Integrated Device Technology, Inc Stender Way, Santa Clara, CA Phone #: (408) Fax #: (408) Errata Notification

Integrated Device Technology, Inc Stender Way, Santa Clara, CA Phone #: (408) Fax #: (408) Errata Notification Integrated Device Technology, Inc. 2975 Stender Way, Santa Clara, CA - 95054 Phone #: (408) 727-6116 Fax #: (408) 727-2328 Errata Notification EN #: IEN01-02 Errata Revision #: 11/5/01 Issue Date: December

More information

INPUT-OUTPUT ORGANIZATION

INPUT-OUTPUT ORGANIZATION INPUT-OUTPUT ORGANIZATION Peripheral Devices: The Input / output organization of computer depends upon the size of computer and the peripherals connected to it. The I/O Subsystem of the computer, provides

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor

EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor EITF20: Computer Architecture Part 5.2.1: IO and MultiProcessor Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration I/O MultiProcessor Summary 2 Virtual memory benifits Using physical memory efficiently

More information

ECE 485/585 Microprocessor System Design

ECE 485/585 Microprocessor System Design Microprocessor System Design Lecture 15: Bus Fundamentals Zeshan Chishti Electrical and Computer Engineering Dept. Maseeh College of Engineering and Computer Science Source: Lecture based on materials

More information

MARIE: An Introduction to a Simple Computer

MARIE: An Introduction to a Simple Computer MARIE: An Introduction to a Simple Computer 4.2 CPU Basics The computer s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit.

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a

PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a PCI-X Protocol Addendum to the PCI Local Bus Specification Revision 2.0a July 22, 2003 REVISION REVISION HISTORY DATE 1.0 Initial release. 9/22/99 1.0a Clarifications and typographical corrections. 7/24/00

More information

2 MARKS Q&A 1 KNREDDY UNIT-I

2 MARKS Q&A 1 KNREDDY UNIT-I 2 MARKS Q&A 1 KNREDDY UNIT-I 1. What is bus; list the different types of buses with its function. A group of lines that serves as a connecting path for several devices is called a bus; TYPES: ADDRESS BUS,

More information

I/O Buffering and Streaming

I/O Buffering and Streaming I/O Buffering and Streaming I/O Buffering and Caching I/O accesses are reads or writes (e.g., to files) Application access is arbitary (offset, len) Convert accesses to read/write of fixed-size blocks

More information

Chapter 12: Multiprocessor Architectures

Chapter 12: Multiprocessor Architectures Chapter 12: Multiprocessor Architectures Lesson 03: Multiprocessor System Interconnects Hierarchical Bus and Time Shared bus Systems and multi-port memory Objective To understand multiprocessor system

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information