Virtual Interface Architecture over Myrinet. EEL Computer Architecture Dr. Alan D. George Project Final Report

Size: px
Start display at page:

Download "Virtual Interface Architecture over Myrinet. EEL Computer Architecture Dr. Alan D. George Project Final Report"

Transcription

1 Virtual Interface Architecture over Myrinet EEL Computer Architecture Dr. Alan D. George Project Final Report Department of Electrical and Computer Engineering University of Florida Edwin Hernandez December 1998.

2 Implementation of a Virtual Interface Architecture over Myrinet Edwin Hernandez - hernande@hcs.ufl.edu 1. Introduction The evolution of network interfaces has been improving over years, but he network protocols add a big amount of overhead that it is translated into higher latency and low useful throughput. The traditional protocols such as TCP, UDP and TP4 are not appropriate for high performance environments where lightweight communications are required in order to maximize the bandwidth theoretically possible to achieve. In other words, protocols have to be light-weight and they should maximize the useful throughput and allow minimum latency. Several companies, software and hardware, visioning this problem have come up with new ideas. One of them is called, Virtual Interface Architecture (VIA), as a matter of fact VIA was born from the combined force of COMPAQ, Microsoft and INTEL [VIA98]. The Virtual Interface Specification is defined as a standard and several organizations agree with it and right now the version 1.0 of the standard is available. [VIA97]. Several papers have been published regarding the Virtual Interface Architecture in which several Virtual Interfaces, VI, have been implemented to probe the concept of VI as well as probe the reduction in latency and bandwidth. They have used different NIC such as Myrinet and even Ethernet [Dunn98] and [Eick98]. However, This work was previously started with PA-RISC network interface architectures [Banks93], virtual protocols for myrinet as stated in [Rosu95], moreover several researchers have tried to localize the bottlenecks and performance improvements in NIC's like the work done by [Davi93] and [Rama93] in which they have stated the general memory management concepts as well as I/O handling techniques. As shown in Section 5, all the measurements and tests were developed at the Myrinet test-bed in the High Performance Computing and Simulation Research Lab (HCS Lab) at the University of Florida. 2. Background The Virtual Interface Architecture is a new concept, this concept perfectly fits the ideas and conceptions of high performance networks and the design of clusters. The VIA tries to boost the performance by not allowing excessive copying and performing several tasks without care of many layers and some other important issues, those issues are explained in section 2.1.

3 2.1. The Virtual Interface Architecture. [VIA98] The VIA attacks the problem of relatively low achievable performance of inter-process communication (IPC) within a cluster 1. The overhead is the one that determines the performance of IPC. The software overhead added during the send/receive operations over a message through the network. The amount of software layers that are traversed, imply a great amount of context switches, interrupts and data copies when crossing those boundaries. However the increase of the processor clock helps in the processing of software layers, this is not a determinant factor in the reduction of the performance (large penalties for cache miss, software layers imply a lot of branches). With the introduction of OC-3 ATM, network bandwidth are being increased from 1Mbps to Mbps and 1Gbps as backbones, but the "raw" bandwidth almost never can be achieved. Having those two reasons into consideration, INTEL and other companies have developed the VIA, which can be described by: User Agent Kernel Agent. An user is the software layer using the architecture, it could be an application or communication services layer. The kernel agent is a driver running in protected (kernel) mode. It must set up the necessary tables and structures that allow communication between cooperating processes. VIA accomplishes low latency in a message-passing environment by following these rules: Eliminate any intermediate copies of the data Eliminate the need of a driver running in protected kernel mode to multiplex a hardware resource. Avoid traps into the operating system whenever possible to avoid context switches in the CPU as well as cache thrashing Remove the constraint of requiring an interrupt when initiating an I/O operation Define a simple set of operations that send and receive data. Keep the architecture simple enough to be emulated in software. What VIA does with the processes is that it presents an illusion that it owns the interface to the network. Each VIA consists of one send and one receive queue, and is owned and maintained by a single process. 1 Cluster computing consists of short distance, low-latency, high-bandwidth IPCs between multiple building blocks. Cluster building blocks include server, workstations and I/O subsystems, all of which connect directly to a network.

4 A process can own many Virtual Interfaces (VI), and many processes can own many Vis, the kernel by itself can also own a VI. The VI queue is formed by a linked list of variable-length descriptors. To add a descriptor to a queue, the user builds the descriptor and posts it onto the tail of the appropriate work queue. That same user pulls the completed description off the head of the same work queue they were posted on. The process that owns the queue can post four types of descriptors. Send, remote-dma/write, remote- DMA/read descriptors are placed on the send queue of a VI. Receive descriptors are placed on the receive queue of a VI. VIA also provides polling and blocking mechanisms to synchronize between the user process and completed operations. When descriptor processing completes, the NIC writes a done bit and includes any error bits associated with that descriptor in its specified fields. This act transfers ownership of the descriptor from the NIC back to the process that originally posted it. These queues are an additional construct that allows the coalescing of completion notifications from multiple work queues into a single queue. The two work queues of one VI can be associated with completion queues independently of one another. Now the descriptors mentioned are constructs which describe the work to be done by the Network Interface. This is very similar to the architecture proposed in [Davi93]. The SEND/RECEIVE descriptors contains one segment and a variable number of data segments. Remote-DMA/write and remote- DMA/receive descriptors contain one additional address segment following the control segment and preceding the data segments. The VIA also has : immediate data access of a 32-bit data in a descriptor. The order of the descriptors is preserved in a FIFO queue, it is easy to maintain consistency with send/recive and remote/dma write, however remote/dma recieve is a round-trip transaction and it is not completed until the requested data is returned from the remote node/endpoint. Work queue scheduling. There is no implicit ordering relationship between descriptors and VIs therefore the scheduling service depends on the algorithm used by the NIC. Memory protection, provides memory protection and ensure that a user process cannot send out of, or receive into, memory that it does not own. Virtual address Translation. This is done when the kernel agent registers a memory region, the kernel agent performs ownership checks (it comes from a user agent request), it pins the pages into physical memory, and probes the regions of the virtual-to-physical address translation.

5 Vi Consumer Application OS Communication Interface VI User Agent User mode kernel mode Send/Receive/RDMA Read/ RDMA write VI KERNEL Agent Se nd Re cv V I Se nd Re cv V I Se nd Re cv V I VI Network Adapter Figure 1. VI Architectural Model 3. MODEL DESIGN For this class project, there is not enough time to build a Hardware implementation of the VI on-chip using the Myrinet interface, however it is highly possible to interact with the Myrinet card and generate a Virtual Emulation of the Virtual Interface merely in software. For this reason, in Appendix 1., there is the source code in C++, which is located on the top of the Myrinet, therefore the performance enhancements reached won't be as high as expected, however the Model followed by this project will remain with some modifications. It should be noticed that the RDMA transfers, as well as error handling where left aside for this project, the only concerns in the design of the VI were: VI initialization and interaction with the VI Implement the Send and Receive Queues Implement the completion queues Use the standard data-types mentioned in the specification [VIA97] Make use of a small application ECHO/REPLY Server for the performance tests. In addition, the software makes use of the Myrinet adapter in making transfers in DMA mode. At very early stage this was not taken into account but it seems to be not a good performance "booster" and the measurements made are quite low, but his aspect will be explained in Section 5.

6 The basic objects used are: Myrinet, which is in charge of handling the send, receive and initialization, it dialogs directly to the myrinet card. - int(), initializes the interface, in this case it is needed to change the route of the Myrinet DMA transfer. In other words, first the interface sends data, then it has to be reinitialized to receive the Reply from the server, this also happens at the server. - Send() and Recv(), post and interacts with the shmen* structure in terms of receiving the data from the Myrinet's SRAM and post data into it. VI, The Virtual Interfac Object contains the Send Queue, the Receive Queue and the Completion Queue. A description of the class members is posted above: - NIC. Object to reference the instance of the interface being used, in this case Myrinet. - CQ, SendQ, RecvQ: These data types are used for the queues of descriptors, they are handled as a List objet. The List object was also developed and it contains all the functions of a link list. - SetupDescriptorSendRecv(), this functios is made to initialize a descriptor whether for send or receive, to or from the working queues. The descriptor created here has the data-type VIP_DESCRIPTOR defined in the VIA specification. - ViPostSend().This function is in charge of Posting the Send Descriptor to the queue, it does not transmits the data. - ViProcesSend(). This function pops the first descriptor from the queue and starts delivering the content pointed by the descriptor (DS[0].Local.Data.Address) to the shmem->sendbuffer pointer. This is the real Send. - ViPostRecv(), is in charge of posting a reception descriptor into the receive queue, it has information to the data addresses to store the information. An application can also receive a descriptor from the other end, depending on the protocol used. For this basic application the receive descriptor is formed in the reception peer. - ViProcessRecv(), the process of reception is done though this method, and as well as the viaprocessend() pops the first element in the Reception queue and writes whatever is red from the object NIC into the destination address. - EchoServer(). Member function to allow be a Server waiting for incoming data and replying the same data. - EchoClient(). Member function to allow Sending a block of MTU (Maximum Transfer Unit) data to the other end, expects for a reply and compares whatever was sent with the content of the receiving data. The VIPL.h is the most important of the libraries, because it contains all the data types stated in the specification., it defines descriptors, responses, error handling, memory management and some other VI

7 properties. However, it was not implemented as stated there, it was modified to agree with the requirements of this project and the HCS Lab resources. As stated before two aspects where left aside for the VI application implementation: a) Threads and multi-threading issues, in order to keep the VI clean of the vices used in the other protocols, it is imperative to use a library of light-weight threads, otherwise all the overhead introduced by the traditional thread libraries will twist the results b) Remote DMA reads and writes, there are two main reasons for leaving this aside, the first one is requirements of direct memory manipulation which is not permitted without the proper system administrator rights and the second one, because is not quite clear in the standard how to achieve it. c) Error handling was not implemented and a Error-Free environment should be assumed for all the results. 4. EXPERIMENTS Experiments were directed in three main areas: Latency, Throughput and Time overhead attributed to the client and server of the application developed. They were also made following the performance results obtained by [Berry97] and [Erick98]. In fact, the values gathered by them have much higher performance than the values gathered at the HCS Lab, the reason could be the VI implementation in hardware, not a software emulation and a better understanding of the Myrinet architecture, in terms of modes of operation and how to improve the data transfers. Fist of all, the SAN used consisted of two computers, viking and vigilante, both Sun Ultra-2 interconnected through the Myrinet switch version 1.0, Berry and his team used Pentium Pro 200 MHz, PCI bus and there is not much specification concerning the application used. The set of experiments selected consisted on: - Throughput in Myrinet with and without the VI - Latency with and without the VI - Time distribution at the Client and Server using the VI The results and analysis are show in section RESULTS AND ANALYSIS The mode of operation for the Myrinet adapter was DMA transfer and as shown in Figure 2., it was not the best option, however it fulfilled the requirement of an easy implementation.

8 Mbytes 20 Throughput Using DMA Throughput Using Mem_map Thorughput TCP_STREM payload in bytes Figure 2. Throughput measurements for Myrinet using different modes of operation As shown there the DMA transfer does not improve the performance whenever is 64 bytes long, moreover a TCP_STREAM test made with netperf generates a better performance. But the main goal with this paper is not finding a better mode of operation for Myrinet, if not a way to probe that the VIA is a good concept and it can be used in SANs. Having this in mind, it will be only a matter of transport or multiply by a factor of performance improvement of whatever is found from now on. The first measurement made consists of the Latency with and without the VI, the RAW-Myrinet represents the application without the VI overhead, or bulk data transfers and the VI-Myrinet represents the Latency of the ECHO/Reply Round trip divided by two. Latency of the Myrinet Interface Micro-seconds 600 Raw - Myrinet VI - Myrinet Payload (bytes) Figure 3. Latency measurements with myrinet using raw data and the VI on the top of the myrinet

9 From figure 3, can be concluded that the increment in the latency is a constant and not greater than 25%. If the results shown here are compare with the ones reported by Berry, the difference between latencies are of a ration of 4:1, in which this ones are the greatest. However, a aspect that is not being taken into account by Berry and this project is the fact that the VIA specification defines as a Maximum Transfer Unit of 32KB, but all the measurements where done by the other researches at payloads not greater than 8Kbytes. Throughput of the VI and the Raw Data Transfer with Myrinet Mbytes/sec 6 VI - Myrinet DMA-Myrinet Payload Figure 4. Throughput measurements using with and without the VI In terms of throughput, the performance is decreased in about 40% between the raw-data transfer and the transfer done using the VI, this value was not expected and unfortunately there are no references to compare. Generally, VI performance is compare between the Kernel Agent implementation and the VI emulation, but not against the raw transfer performance. In addition to the throughput, it is required to find where is the performance bottleneck standing, in other words where is the 40% of the value in mention is lost. In order to get and discover this value, time stamping was executed along the client and server application. Although the measurement could be done and compare both client and server, it is at the client the most representative of the two peers.

10 For this reason in figure 5.0, it is shown the distribution of the time in every process of the application. In the VIA the process of descriptors is basically negligible and most of the time is spent in the data transfer and reception of the reply (waiting to receiving descriptor). VI Distribution of Time at the CLIENT (MTU=8192) 5% 4% 0% 1% 1% 39% 50% Setting Send Descriptor Posting Descriptor Processing Send Myrinet Send (DMA) Waiting to Receive Desc Receiving Descriptor (CQ ready) Readind Data Figure 5. Time distribution of the Echo/Reply application This behavior is expected because the 30% of Myrinet-Myrinet transfer has to be in both ends, therefore if 12% is spent at both ends, it will represent approximately 24% of overhead left, plus from the 50% which includes 30% server reply, will leave a 10%. more, ends up in 30-35% of processing overhead. This processing overhead is a constant as shown in Figure 6.0, and it is a matter of improvement of the Myrinetto-Myrinet transmission to get better performance levels. Time Spent Processing Descriptors micro-seconds 150 Client Side Server Side Payload Figure 6. Time variation of descriptors processing at client and server.

11 This behavior (on Figure 6.) is explained by the algorithm itself, a block is sent to the server from the client, the client waits for that block of data, the server copies the data pointed by the descriptor and writes a send queue with the same data, the data is sent back to the client. In other words, only one descriptor is needed for sending or all working queues handle only one-element at the time. 6. CONCLUSIONS First, a proof-of-concept has been achieved at the HCS Lab, the philosophy of the VIA in SAN can be applied reducing the complexity of the OSI models and layered protocols. The implementation developed introduces an overhead of 10% on descriptors and data processing, at both ends. For an echo-reply application the average overhead, having as a reference raw-myrinet transmission using DMA is of 40%. The average latency added by the VIA to the new application is of 25% maximum. It turns out that the use of the DMA transfer was not the best choice, it is recommended to use any other technique. 7. FUTURE RESEARCH Further research could be done in terms of implementation, first improve the Myrinet-to-Myrinet communication, using mem_map instead of DMA. Implement error checking and Remote DMA reads and writes. Make use of the SCALE_Threads or any other light-weight library and use multi-issue implementation. In addition to that, the queues, completion, reception and transmission could be modified and switching form a simple FIFO to something more efficient such as a Hash table, which will have low processing overhead but could improve the performance. ACKNOWLEDGEMENTS I'd like to thank to Wade Cherry for his introduction and explanation of the LANAI and Myrinet applications. I'd like to thank the team at INTEL for defining the VIPL.H Library and providing the source code for free use through the internet, and their visual C++ application from which I gathered lots of ideas and finally understood the philosophy of VIA. REFERENCES [Banks93] Banks, D., Prudence M. "A high-performance Network Architecture for PA-RISC Workstation", IEEE Journal on Selected Areas of Communications", vol. 11, No. 2, February 1998, pp [Berry97] [Davi93] Berry, F. Deleganes, E. "The Virtual Interface Architecture Proof-of-Concept Performance Results", INTEL Corp white paper. Davie, B. "Architecture and Implementation of a High-Speed Host Interface", IEEE Journal on Selected Areas in Communications", Vol. 11, No. 2, February 1998., pp

12 [Dunn98] [Eick98] Dunning, D, et.al "The Virtual Interface Architecture", IEEE MICRO, v 18, n 2, April 1998, pg Von Eicken, T., Vogels, W. "Evolution of the Virtual Interface Architecture ", IEEE Computer Magazine, November 1998, pp [Rama93] Ramakrishnan, K, "Performance Considerations in Designing Network Interfaces", ", IEEE Journal on Selected Areas in Communications", Vol. 11, No. 2, February 1998., pp [Rosu95] Marcel, Rosu. "Processor Controller off-processor I/O", Cornell University, Grant ARPA/ONR N J-1866, August [Steen97] Steenkiste, P."A High Speed Network Interface for Distributed-Memory Systems : Architecture and Applications ", ACM Transactions on Computer Systems, Vol. 15, No. 1, February 1997, pp [Wels98] Welsh, M., et.al." Memory Management for User-Level Network Interfaces ", v 18, n 2, April 1998, pp Web Pages [VIA98] [INT98] APPENDICES Appendix 1. Source Code for the VIA_Server

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

Message Passing Architecture in Intra-Cluster Communication

Message Passing Architecture in Intra-Cluster Communication CS213 Message Passing Architecture in Intra-Cluster Communication Xiao Zhang Lamxi Bhuyan @cs.ucr.edu February 8, 2004 UC Riverside Slide 1 CS213 Outline 1 Kernel-based Message Passing

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007 Working with 1 Gigabit Ethernet 1, The School of Physics and Astronomy, The University of Manchester, Manchester, M13 9PL UK E-mail: R.Hughes-Jones@manchester.ac.uk Stephen Kershaw The School of Physics

More information

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071

More information

THE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks

THE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks Thorsten von Eicken Dept of Computer Science tve@cs.cornell.edu Cornell niversity THE -NET SER-LEVEL NETWORK ARCHITECTRE or: it s easy to buy high-speed networks but making them work is another story NoW

More information

Measurement-based Analysis of TCP/IP Processing Requirements

Measurement-based Analysis of TCP/IP Processing Requirements Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the

More information

Low-Latency Communication over Fast Ethernet

Low-Latency Communication over Fast Ethernet Low-Latency Communication over Fast Ethernet Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca, NY 14853 http://www.cs.cornell.edu/info/projects/u-net

More information

The Lighweight Protocol CLIC on Gigabit Ethernet

The Lighweight Protocol CLIC on Gigabit Ethernet The Lighweight Protocol on Gigabit Ethernet Díaz, A.F.; Ortega; J.; Cañas, A.; Fernández, F.J.; Anguita, M.; Prieto, A. Departamento de Arquitectura y Tecnología de Computadores University of Granada (Spain)

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

Performance Evaluation of Myrinet-based Network Router

Performance Evaluation of Myrinet-based Network Router Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation

More information

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab

EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT. Konstantinos Alexopoulos ECE NTUA CSLab EXTENDING AN ASYNCHRONOUS MESSAGING LIBRARY USING AN RDMA-ENABLED INTERCONNECT Konstantinos Alexopoulos ECE NTUA CSLab MOTIVATION HPC, Multi-node & Heterogeneous Systems Communication with low latency

More information

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski Operating Systems 16. Networking Paul Krzyzanowski Rutgers University Spring 2015 1 Local Area Network (LAN) LAN = communications network Small area (building, set of buildings) Same, sometimes shared,

More information

EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale

EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale ABSTRACT Both the Infiniband and the virtual interface architecture (VIA) aim at providing effective cluster

More information

Parallel Computing Trends: from MPPs to NoWs

Parallel Computing Trends: from MPPs to NoWs Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell

More information

Introduction to TCP/IP Offload Engine (TOE)

Introduction to TCP/IP Offload Engine (TOE) Introduction to TCP/IP Offload Engine (TOE) Version 1.0, April 2002 Authored By: Eric Yeh, Hewlett Packard Herman Chao, QLogic Corp. Venu Mannem, Adaptec, Inc. Joe Gervais, Alacritech Bradley Booth, Intel

More information

Infiniband Fast Interconnect

Infiniband Fast Interconnect Infiniband Fast Interconnect Yuan Liu Institute of Information and Mathematical Sciences Massey University May 2009 Abstract Infiniband is the new generation fast interconnect provides bandwidths both

More information

Optimizing Performance: Intel Network Adapters User Guide

Optimizing Performance: Intel Network Adapters User Guide Optimizing Performance: Intel Network Adapters User Guide Network Optimization Types When optimizing network adapter parameters (NIC), the user typically considers one of the following three conditions

More information

Evaluation of a Zero-Copy Protocol Implementation

Evaluation of a Zero-Copy Protocol Implementation Evaluation of a Zero-Copy Protocol Implementation Karl-André Skevik, Thomas Plagemann, Vera Goebel Department of Informatics, University of Oslo P.O. Box 18, Blindern, N-316 OSLO, Norway karlas, plageman,

More information

A Scalable Multiprocessor for Real-time Signal Processing

A Scalable Multiprocessor for Real-time Signal Processing A Scalable Multiprocessor for Real-time Signal Processing Daniel Scherrer, Hans Eberle Institute for Computer Systems, Swiss Federal Institute of Technology CH-8092 Zurich, Switzerland {scherrer, eberle}@inf.ethz.ch

More information

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,

More information

Ethan Kao CS 6410 Oct. 18 th 2011

Ethan Kao CS 6410 Oct. 18 th 2011 Ethan Kao CS 6410 Oct. 18 th 2011 Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings

More information

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th, 2012 1 Department of Computer Science, Cornell University Papers 2 Active Messages: A Mechanism for Integrated Communication and Control,

More information

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Title Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Author(s) Wong, KP; Wang, CL Citation International Conference on Parallel Processing Proceedings, Aizu-Wakamatsu

More information

CERN openlab Summer 2006: Networking Overview

CERN openlab Summer 2006: Networking Overview CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,

More information

Review: Hardware user/kernel boundary

Review: Hardware user/kernel boundary Review: Hardware user/kernel boundary applic. applic. applic. user lib lib lib kernel syscall pg fault syscall FS VM sockets disk disk NIC context switch TCP retransmits,... device interrupts Processor

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems DM510-14 Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance 13.2 Objectives

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Networks and distributed computing

Networks and distributed computing Networks and distributed computing Hardware reality lots of different manufacturers of NICs network card has a fixed MAC address, e.g. 00:01:03:1C:8A:2E send packet to MAC address (max size 1500 bytes)

More information

Optimizing the GigE transfer What follows comes from company Pleora.

Optimizing the GigE transfer What follows comes from company Pleora. Optimizing the GigE transfer What follows comes from company Pleora. Selecting a NIC and Laptop Based on our testing, we recommend Intel NICs. In particular, we recommend the PRO 1000 line of Intel PCI

More information

Homework 1. Question 1 - Layering. CSCI 1680 Computer Networks Fonseca

Homework 1. Question 1 - Layering. CSCI 1680 Computer Networks Fonseca CSCI 1680 Computer Networks Fonseca Homework 1 Due: 27 September 2012, 4pm Question 1 - Layering a. Why are networked systems layered? What are the advantages of layering? Are there any disadvantages?

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

Virtual Interface Architecture (VIA) Hassan Shojania

Virtual Interface Architecture (VIA) Hassan Shojania Virtual Interface Architecture (VIA) Hassan Shojania Agenda Introduction Software overhead VIA Concepts A VIA sample Design alternatives M-VIA Comparing with InfiniBand Architecture Conclusions & further

More information

ROB IN Performance Measurements

ROB IN Performance Measurements ROB IN Performance Measurements I. Mandjavidze CEA Saclay, 91191 Gif-sur-Yvette CEDEX, France ROB Complex Hardware Organisation Mode of Operation ROB Complex Software Organisation Performance Measurements

More information

An FPGA-Based Optical IOH Architecture for Embedded System

An FPGA-Based Optical IOH Architecture for Embedded System An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing

More information

4. Environment. This chapter describes the environment where the RAMA file system was developed. The

4. Environment. This chapter describes the environment where the RAMA file system was developed. The 4. Environment This chapter describes the environment where the RAMA file system was developed. The hardware consists of user computers (clients) that request reads and writes of file data from computers

More information

Chapter 12: I/O Systems

Chapter 12: I/O Systems Chapter 12: I/O Systems Chapter 12: I/O Systems I/O Hardware! Application I/O Interface! Kernel I/O Subsystem! Transforming I/O Requests to Hardware Operations! STREAMS! Performance! Silberschatz, Galvin

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS Performance Silberschatz, Galvin and

More information

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition

Chapter 12: I/O Systems. Operating System Concepts Essentials 8 th Edition Chapter 12: I/O Systems Silberschatz, Galvin and Gagne 2011 Chapter 12: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations STREAMS

More information

Advanced Computer Networks. RDMA, Network Virtualization

Advanced Computer Networks. RDMA, Network Virtualization Advanced Computer Networks 263 3501 00 RDMA, Network Virtualization Patrick Stuedi Spring Semester 2013 Oriana Riva, Department of Computer Science ETH Zürich Last Week Scaling Layer 2 Portland VL2 TCP

More information

An Extensible Message-Oriented Offload Model for High-Performance Applications

An Extensible Message-Oriented Offload Model for High-Performance Applications An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,

More information

Measuring MPLS overhead

Measuring MPLS overhead Measuring MPLS overhead A. Pescapè +*, S. P. Romano +, M. Esposito +*, S. Avallone +, G. Ventre +* * ITEM - Laboratorio Nazionale CINI per l Informatica e la Telematica Multimediali Via Diocleziano, 328

More information

Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router

Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router Prashant Pradhan Tzi-cker Chiueh Computer Science Department State University of New York at Stony Brook prashant, chiueh @cs.sunysb.edu

More information

To provide a faster path between applications

To provide a faster path between applications Cover Feature Evolution of the Virtual Interface Architecture The recent introduction of the VIA standard for cluster or system-area networks has opened the market for commercial user-level network interfaces.

More information

Cluster Computing. Interconnect Technologies for Clusters

Cluster Computing. Interconnect Technologies for Clusters Interconnect Technologies for Clusters Interconnect approaches WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster Interconnects FastEther Gigabit EtherNet 10

More information

Networks and distributed computing

Networks and distributed computing Networks and distributed computing Abstractions provided for networks network card has fixed MAC address -> deliver message to computer on LAN -> machine-to-machine communication -> unordered messages

More information

The Tofu Interconnect 2

The Tofu Interconnect 2 The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect

More information

Motivation CPUs can not keep pace with network

Motivation CPUs can not keep pace with network Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Chapter Seven Morgan Kaufmann Publishers

Chapter Seven Morgan Kaufmann Publishers Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be

More information

Virtualization, Xen and Denali

Virtualization, Xen and Denali Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two

More information

COT 4600 Operating Systems Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM

COT 4600 Operating Systems Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 3:00-4:00 PM Lecture 23 Attention: project phase 4 due Tuesday November 24 Final exam Thursday December 10 4-6:50

More information

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic) I/O Systems Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) I/O Systems 1393/9/15 1 / 57 Motivation Amir H. Payberah (Tehran

More information

PCnet-FAST Buffer Performance White Paper

PCnet-FAST Buffer Performance White Paper PCnet-FAST Buffer Performance White Paper The PCnet-FAST controller is designed with a flexible FIFO-SRAM buffer architecture to handle traffic in half-duplex and full-duplex 1-Mbps Ethernet networks.

More information

The QLogic 8200 Series is the Adapter of Choice for Converged Data Centers

The QLogic 8200 Series is the Adapter of Choice for Converged Data Centers The QLogic 82 Series is the Adapter of QLogic 1GbE Converged Network Adapter Outperforms Alternatives in Dell 12G Servers QLogic 82 Series Converged Network Adapter outperforms the alternative adapter

More information

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture.

... Application Note AN-531. PCI Express System Interconnect Software Architecture. Notes Introduction. System Architecture. PCI Express System Interconnect Software Architecture Application Note AN-531 Introduction By Kwok Kong A multi-peer system using a standard-based PCI Express (PCIe ) multi-port switch as the system interconnect

More information

Reviewed by CeemanB. Vellaithurai WSU ID:

Reviewed by CeemanB. Vellaithurai WSU ID: Reviewed by CeemanB. Vellaithurai WSU ID: 11253840 Introduction Smart Grid Communication Infrastructure/Communication Architecture Data Assumptions Simulation Assumptions Good contributions Drawbacks Research

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Custom UDP-Based Transport Protocol Implementation over DPDK

Custom UDP-Based Transport Protocol Implementation over DPDK Custom UDPBased Transport Protocol Implementation over DPDK Dmytro Syzov, Dmitry Kachan, Kirill Karpov, Nikolai Mareev and Eduard Siemens Future Internet Lab Anhalt, Anhalt University of Applied Sciences,

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

FlexNIC: Rethinking Network DMA

FlexNIC: Rethinking Network DMA FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]

More information

Integrated Device Technology, Inc Stender Way, Santa Clara, CA Phone #: (408) Fax #: (408) Errata Notification

Integrated Device Technology, Inc Stender Way, Santa Clara, CA Phone #: (408) Fax #: (408) Errata Notification Integrated Device Technology, Inc. 2975 Stender Way, Santa Clara, CA - 95054 Phone #: (408) 727-6116 Fax #: (408) 727-2328 Errata Notification EN #: IEN01-02 Errata Revision #: 11/5/01 Issue Date: December

More information

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching and Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching and Demand Paging October 17, 2007 Prof. John Kubiatowicz http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer

More information

A Modular High Performance Implementation of the Virtual Interface Architecture

A Modular High Performance Implementation of the Virtual Interface Architecture A Modular High Performance Implementation of the Virtual Interface Architecture Patrick Bozeman Bill Saphir National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Laboratory

More information

Performance of a High-Level Parallel Language on a High-Speed Network

Performance of a High-Level Parallel Language on a High-Speed Network Performance of a High-Level Parallel Language on a High-Speed Network Henri Bal Raoul Bhoedjang Rutger Hofman Ceriel Jacobs Koen Langendoen Tim Rühl Kees Verstoep Dept. of Mathematics and Computer Science

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching (Finished), Demand Paging October 11 th, 2017 Neeraja J. Yadwadkar http://cs162.eecs.berkeley.edu Recall: Caching Concept Cache: a repository

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

Can User-Level Protocols Take Advantage of Multi-CPU NICs?

Can User-Level Protocols Take Advantage of Multi-CPU NICs? Can User-Level Protocols Take Advantage of Multi-CPU NICs? Piyush Shivam Dept. of Comp. & Info. Sci. The Ohio State University 2015 Neil Avenue Columbus, OH 43210 shivam@cis.ohio-state.edu Pete Wyckoff

More information

Host Interfacing at a Gigabit

Host Interfacing at a Gigabit University of Pennsylvania ScholarlyCommons Technical Reports (CS) Department of Computer & nformation Science April 1993 Host nterfacing at a Gigabit C. Brendan S. Traw University of Pennsylvania Follow

More information

RDMA-like VirtIO Network Device for Palacios Virtual Machines

RDMA-like VirtIO Network Device for Palacios Virtual Machines RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network

More information

Chapter 13: I/O Systems

Chapter 13: I/O Systems COP 4610: Introduction to Operating Systems (Spring 2015) Chapter 13: I/O Systems Zhi Wang Florida State University Content I/O hardware Application I/O interface Kernel I/O subsystem I/O performance Objectives

More information

NOW and the Killer Network David E. Culler

NOW and the Killer Network David E. Culler NOW and the Killer Network David E. Culler culler@cs http://now.cs.berkeley.edu NOW 1 Remember the Killer Micro 100,000,000 10,000,000 R10000 Pentium Transistors 1,000,000 100,000 i80286 i80386 R3000 R2000

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet*

Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet* Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet* Hai Jin, Minghu Zhang, and Pengliu Tan Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong

More information

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by

More information

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University EE108B Lecture 17 I/O Buses and Interfacing to CPU Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements Remaining deliverables PA2.2. today HW4 on 3/13 Lab4 on 3/19

More information

Page 1. Goals for Today" TLB organization" CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement"

Page 1. Goals for Today TLB organization CS162 Operating Systems and Systems Programming Lecture 11. Page Allocation and Replacement Goals for Today" CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement" Finish discussion on TLBs! Page Replacement Policies! FIFO, LRU! Clock Algorithm!! Working Set/Thrashing!

More information

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems CS162 Operating Systems and Systems Programming Lecture 17 Disk Management and File Systems March 18, 2010 Ion Stoica http://inst.eecs.berkeley.edu/~cs162 Review: Want Standard Interfaces to Devices Block

More information

CS162 Operating Systems and Systems Programming Lecture 16. I/O Systems. Page 1

CS162 Operating Systems and Systems Programming Lecture 16. I/O Systems. Page 1 CS162 Operating Systems and Systems Programming Lecture 16 I/O Systems March 31, 2008 Prof. Anthony D. Joseph http://inst.eecs.berkeley.edu/~cs162 Review: Hierarchy of a Modern Computer System Take advantage

More information

An O/S perspective on networks: Active Messages and U-Net

An O/S perspective on networks: Active Messages and U-Net An O/S perspective on networks: Active Messages and U-Net Theo Jepsen Cornell University 17 October 2013 Theo Jepsen (Cornell University) CS 6410: Advanced Systems 17 October 2013 1 / 30 Brief History

More information

Interconnecting Components

Interconnecting Components Interconnecting Components Need interconnections between CPU, memory, controllers Bus: shared communication channel Parallel set of wires for data and synchronization of data transfer Can become a bottleneck

More information

Cisco Series Internet Router Architecture: Packet Switching

Cisco Series Internet Router Architecture: Packet Switching Cisco 12000 Series Internet Router Architecture: Packet Switching Document ID: 47320 Contents Introduction Prerequisites Requirements Components Used Conventions Background Information Packet Switching:

More information

Proceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies

Proceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies Proceedings of FAST 03: 2nd USENIX Conference on File and Storage Technologies San Francisco, CA, USA March 31 April 2, 2003 2003 by The All Rights Reserved For more information about the : Phone: 1 510

More information

Common Computer-System and OS Structures

Common Computer-System and OS Structures Common Computer-System and OS Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture Oct-03 1 Computer-System Architecture

More information

Convergence of Parallel Architecture

Convergence of Parallel Architecture Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty

More information

Investigating the Use of Synchronized Clocks in TCP Congestion Control

Investigating the Use of Synchronized Clocks in TCP Congestion Control Investigating the Use of Synchronized Clocks in TCP Congestion Control Michele Weigle (UNC-CH) November 16-17, 2001 Univ. of Maryland Symposium The Problem TCP Reno congestion control reacts only to packet

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that

More information

Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming

Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming Fall 2006 University of California, Berkeley College of Engineering Computer Science Division EECS John Kubiatowicz Midterm II December 4 th, 2006 CS162: Operating Systems and Systems Programming Your

More information

Introduction to Input and Output

Introduction to Input and Output Introduction to Input and Output The I/O subsystem provides the mechanism for communication between the CPU and the outside world (I/O devices). Design factors: I/O device characteristics (input, output,

More information

Using Time Division Multiplexing to support Real-time Networking on Ethernet

Using Time Division Multiplexing to support Real-time Networking on Ethernet Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,

More information