A Modular High Performance Implementation of the Virtual Interface Architecture

Size: px
Start display at page:

Download "A Modular High Performance Implementation of the Virtual Interface Architecture"

Transcription

1 A Modular High Performance Implementation of the Virtual Interface Architecture Patrick Bozeman Bill Saphir National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Laboratory 1. Overview The Virtual Interface Architecture (VIA) is an industry standard for low-latency high-bandwidth interprocess communication over system area networks (SANs). The VIA specification describes a software interface for fully protected user level communication that can be accelerated by relatively inexpensive VIA-aware hardware. We describe M-VIA, a modular, high-performance and freely available implementation of VIA for Linux. M-VIA makes two significant contributions to the state of the art. First, M-VIA s modularity allows it to support many types of network interfaces (NICs), including legacy NICs and newer smart NICs that have special support for VIA. This high degree of portability has not been achieved or attempted by other userspace communication projects. M-VIA s modularity introduces little overhead, so that M-VIA achieves high-performance. Second, M-VIA provides to applications a portable and robust interface, verifiably conforming to the VIA standard including connection management, error detection, error recovery, and precisely defined semantics. These features make it suitable as a reference implementation and as a base for commercial software development. Previous proof-of-concept research projects have demonstrated high performance but have not emphasized robustness in either the interface or the implementation. M-VIA is implemented as a set of loadable kernel modules for Linux and a user level library. It supports so-called VIA doorbells where they are provided by VIA-aware hardware, and implements software doorbells with a fast trap (a trap to privileged mode that does not incur the overhead of a system call) for legacy hardware. Transfer of data occurs directly from an application s address space, with no copy other than what is required by the network interface, and no operating system overhead in the critical path. M- VIA coexists with traditional networking, allowing a single network to be used for both VIA and IP traffic. 2. Overview of Virtual Interface Architecture (VIA) Academic researchers have developed a variety of techniques for performing very low overhead communication on almost any network. Their research has shown that one can avoid the copying and processing overhead associated with TCP, as well as the overhead of a system call, while still providing full protection. Well-known examples are Active Messages [Eicken92], U-Net [Basu96], and Fast Messages [Pakin95]. These projects have demonstrated a proof-of-concept, but they have not been widely adopted, even within the high performance computing community. IP remains the only protocol that is widely available. Virtual Interface Architecture (VIA) is a production oriented, high-performance communication mechanism for system area networks (SANs). Its design was strongly influenced by the academic research on low-overhead communication as well as experience with MPPs [Pierce94]. Like these projects, VIA provides fully protected user-level access to a network interface. Because of its widespread industry support (Intel, Compaq and Microsoft are the three primary promoters of VIA), it is likely that VIA will become widely adopted. Moreover, VIA can be accelerated by relatively inexpensive VIA-aware hardware, and such hardware will more naturally support VIA than competing communication mechanisms. Examples include Giganet [Giganet98], Synfinity [Larson98] and ServerNet-II [Tandem95].

2 The VIA 1.0 specification [VIA97] was finished in December 1997, after feedback from over a hundred industrial and academic contributors, including the authors of this paper. It provides send and receive operations for message passing, as well as remote memory access operations, which allow read/write access to the memory of a remote process without the explicit cooperation of that process. VIA communication is categorized as unreliable, reliable delivery or reliable reception. Implementations may provide one or more of these modes, usually depending on characteristics of the hardware (though software may implement reliable VIA on unreliable hardware). VIA provides protected zero-copy data transfer (where supported by network hardware), without requiring operating system kernel assistance. VIA requires that memory used in communication be registered by the application prior to communication to avoid page faults on transmission or reception of data. Higher-level communication APIs such as the Message Passing Interface (MPI) can be efficiently layered on VIA [Dimitrov99]. While we are primarily interested in scientific computation, VIA has a number of commercial applications in the area of high performance servers Other important commercial drivers for VIA are the forthcoming NGIO and Future IO standards for high performance peripherals. NGIO is expected to rely on VIA as its transport mechanism. The VI Architecture consists of three components. The user-visible component is a library known as VIPL (VI Provider Library) that contains routines for data transfer, connection management, queue management memory registration and error handling. The second component, the VI Kernel Agent, provides necessary kernel services, including connection management and memory registration. The third component, the VI Network Interface (VI NIC), performs the actual data transfer. It is conceptually a piece of hardware, but may be implemented as a combination of hardware and software. The NIC can directly access user memory and provides a doorbell (usually a memory-mapped register) that VIPL uses to notify the NIC that new entries have been placed in VI work queues. To send and receive messages, a user application writes a VIA descriptor in an area of registered memory, and calls a VIPL routine that presses the doorbell to let the NIC know that that the descriptor is available for processing. Of the three major components, only VIPL is specified in detail by the VIA standard, and even this specification is only a recommendation. To enable truly portable applications, Intel wrote the VI Architecture Developer s Guide [Intel98] that specifies the VIA API in much more detail. Intel released an extensive conformance test suite to determine whether VIA implementations are in compliance. VIA applications that use VIPL (as clarified by the Developer s Guide) should be portable between different conforming VIA implementations. The majority of the VIA community supports the adoption of the standard interface specified in the Developer s Guide. 3. The M-VIA Implementation of VIA We have developed a high-performance modular implementation of VIA for the Linux operating system called Modular VIA (M-VIA). M-VIA is implemented as a user-level library (libvipl.a) and at least two loadable kernel modules for Linux. The core module is device-independent and provides the majority of functionality needed by VIA. One or more device-specific modules, called device modules, implement device-specific functionality. A device module is essentially a device driver, and includes the standard device driver code plus M-VIA-specific modifications M-VIA Modular Design A primary design goal of M-VIA is to enable the rapid implementation of VIA for new network interfaces, including legacy dumb NICs as well as newer smart NICs with either special VIA support (e.g. support for VIA doorbells and VIA descriptor processing) or programmable processors. M-VIA achieves this goal through a modular implementation. It provides a complete VIA framework, but allows a device module to

3 replace a subset of VIA functionality in a device-specific way. With no hardware support, we describe a VIA implementation as software-only, and otherwise call it hardware-accelerated. This modular division between core management and device specific operations facilitates the rapid development of support for new devices. In a hardware-accelerated implementation, the device module can register hardware functionality to allow the hardware to take over core functions, such as memory and doorbell management. In particular, VIA doorbells for VIA-aware hardware are usually implemented as memory-mapped registers read and written by user-level code to tell the network interface that new descriptors have been posted. With hardware acceleration, M-VIA requires no memory-to-memory copies to transfer data. In a software-only implementation, it is critical that the doorbell operation have as little overhead as possible. M-VIA uses a fast trap to execute privileged code with minimum overhead. A fast trap incurs significantly less overhead than a system call, which performs additional operations related to scheduling and signal processing. The 38 instructions written in assembly code to implement the fast trap constitute the only processor-specific code in M-VIA (currently the x86 architecture is supported; Alpha support and PowerPC support are planned). In a software-only implementation, data transmission requires a single memory-to-memory copy inside the interrupt handler at the receiver. This copy is unavoidable for protected communication without special hardware support. M-VIA provides wire level interoperability among software-only NICs. This is facilitated by an additional abstraction called a Device Class, which is a framework within the device module for handling devices with similar characteristics. For instance, an EtherRing class can be used for Ethernet devices with a circular queue of buffer descriptors. The majority of Ethernet devices use this as their internal architecture. Of course, wire level interoperability is not restricted to such a class, only facilitated by it. Modularity does not adversely affect performance. Time-sensitive operations, such as the actual transmission of data, are fast-pathed. Specifically, communication between Devices and Device Classes is through macros; rather than through function calls, and VIA doorbell operations for software NICs are implemented with fast traps. M-VIA achieves high bandwidth for software-only NICs by incorporating virtual memory management into the core module, enabling the transfer of data directly from an application s address space, with no additional memory copies other than those required by the network interface. A side benefit of this approach is that communication within an SMP requires only a single memory copy, whereas arbitrary message passing between separate address spaces requires two copies for any mechanism that is implemented purely in user-space. Thus, bandwidth of non-pipelined VIA communication between two processes on an SMP is approximately two times higher than achievable through other mechanisms M-VIA Core Module The M-VIA core module is divided into device independent, reusable, functional components. Connection Manager: Establishes logical point-to-point connections between VIs. Protection Tag Manager: Allocates, deallocates, and validates memory protection tags. Registered Memory Manager: Handles the registration of user communication buffers and descriptor buffers. Completion Queue Manager: Manages the optional completion queues associated with VI work queues, as well as user requests to block on completion. Error Queue Manager: Provides a mechanism for posting asynchronous error by VIA devices and for blocking on errors by asynchronous error handling threads of VI applications. Linux Kernel Extensions: Provides functionality required for efficient implementation, including: condition variables; user to kernel memory remapping; and user address to physical address translation.

4 The core module provides the default functionality for all VIA operations. To perform device specific functions, the framework components listed above call routines registered by specific device modules. For example, the Connection Manager handles the common support issues relating to queuing requests: blocking for connection completion; verifying connection attributes; assigning a unique connection id; etc. However, the Connection Manager calls functions registered by the device module to actually perform the transmission of the request, acceptance, or rejection of a connection to a remote device. Operations that are entirely device specific, such as the creation and destruction of VIs and the transmission of data to and from the wire, are passed directly to the appropriate device. However, the core framework provides some functions to make the design of such operations easier to implement. For example, generic descriptor processing routines are provided for use by software-only devices Device Modules A device module provides the abstraction of a VI NIC. When a device module registers itself with the core module, the device module informs the core module of its capabilities, such as whether it supports VIA directly in hardware, its native MTU size, the maximum number of VIA descriptors that can be queued for transmission, etc. The device module also registers device specific functions to be used by the modular managers from the core module. The developer of the device module has the option of overriding any and all of the default functionality provided by the core module. For example, if a device that provides native VIA hardware support uses its own mechanism for registering memory, it may completely replace the Registered Memory Manager with an implementation of its own Device Classes Many commodity network interfaces can be logically grouped into common categories such as Ethernet, ATM, FDDI, etc. In order to promote wire level interoperability and rapid development through code reuse, device modules can be written using an internal abstraction called a Device Class. M-VIA devices classes are slightly finer-grained than network types, such as the EtherRing category mentioned above. Device Classes enable common routines for a class of network interfaces to be shared by device modules. Such routines include operations such as the construction and interpretation of media-specific VIA headers and mechanisms for enabling VIA to co-exist with traditional networking protocols, i.e. TCP/IP. While Device Classes are not explicitly supported by the device module, the device module interface is designed to facilitate their use. Macros are used for communication between a device-specific code and a device class, and these are integrated into a device module The VI Provider Library (VIPL) M-VIA contains a single VI Provider Library, VIPL, which is interoperable with software-only and hardware-accelerated VIA devices developed within the M-VIA framework. Device modules specify to VIPL whether the VI Provider Library should use ioctl system calls or fast traps to call time-sensitive VI Kernel Agent services. Device modules also specify whether the VIA Doorbell mechanism is supported directly in hardware as a true memory mapped doorbell or should be emulated with a fast trap M-VIA 2 Based on experiences gained with M-VIA 1, we have begun the design of a modified internal organization in M-VIA 2. The modifications affect both the VIPL and the Core Module. M-VIA 2 design documents are available at

5 The original design of M-VIA was based upon early drafts of the Virtual Interface Architecture Specification. Unfortunately, when the VI Architecture 1.0 specification was released, it relaxed the specification in areas relating to hardware interaction, becoming a specification of the user level VIA component only. This change requires devices to be capable of providing custom user-level functionality to operate efficiently. Currently NIC-specific functionality can be substituted in the Kernel Agent only. Two specific examples of this are doorbells and completion queues. In pre-1.0 versions of the VI Architecture specification, doorbell operations used a standardized Doorbell Token format. The Doorbell Token format is no longer specified in VIA 1.0 (including in the Developer s Guide). A similar problem occurred with the introduction of Completion Queues in the VI Architecture Specification 1.0. To be implemented efficiently, Completion Queues require direct communication between the VI NIC and VIPL. However, the mechanism and data structures used to accomplish this are not defined. The modularized VIPL implementation in M-VIA 2 will enable the substitution of device specific functionality at the user level as well as inside the kernel Functionality and Conformance The Intel Virtual Interface Architecture Developer's Guide describes three levels of conformance to the VIPL API: Early Adopter; Functional; and Full conformance. The Intel VI Architecture Conformance Suite [Intel98a] tests an implementation's conformance to the VIPL API. The conformance suite, consisting of over lines of code, performs thousands of individual tests grouped into functional categories: 34 for Early Adopter; 134 for Functional Conformance; and 156 for Full Conformance. Basic VIPL semantic compliance, resource management, proper handling of error conditions, invalid inputs, and network stress tests are included in the conformance suite. M-VIA passes all of the Early Adopter conformance tests on unreliable networks and includes RDMW Write capability. Reliable Delivery and Reliable Reception will be supported for networks that support these. At the Functional Conformance level, M-VIA implements all functionality except peer-to-peer connection management and resizing of completion queues, including synchronous error handling, remote disconnect notification, and Protection Tag support. M-VIA passes 109 of the 134 Functional Conformance tests included in the test suite. The tests that M-VIA does not pass either contain bugs or calls to the peerto-peer connection management routines. The only additional functions missing from the Full Conformance level are the notify routines, which are essentially syntactic sugar. M-VIA uses Posix threads (pthreads) internally for asynchronous error notification, and is pthreadscompatible, enabling the development of multi-threaded user applications. Operations performed within a multi-threaded application on different VIs are inherently thread safe, but an application must currently provide its own explicit locks if multiple threads access a single VI. A fully thread safe VIPL will be part of M-VIA Implementation Status M-VIA 1.0 supports four NIC types: loopback, fast ethernet cards based on the DEC Tulip chip, the Packet Engines GNIC-1 Gigabit Ethernet Card, and the Packet Engines GNIC-II Gigabit Ethernet Card. We have focused on only a small number of interfaces for two reasons. First, we anticipated fine-tuning the internal interfaces, and did not want to redo the work of implementing all the drivers. Second, our primary goal was a complete and robust implementation of VIA.

6 As described in section 3.1.5, we are currently redesigning internal interfaces for VIPL and the VI Kernel Agent, based on experience with the original design, in order to improve support for smart NICs. This redesign will form the basis of M-VIA 2. With M-VIA 2, we expect an explosion of third-party driver development. There are third-party plans for to implement drivers for Giganet, Myrinet, Servernet, Alteon Gigabit Ethernet and several Intel NICs. M-VIA is freely available for download over the Internet at Performance While the primary focus of M-VIA development so far has been functionality, robustness and modularity, it achieves excellent performance as well. We present here some basic performance comparisons to demonstrate this fact, leaving a more detailed analysis for another report. Latency and bandwidth reported below are measured using a simple pingpong benchmark, in which two processes send a buffer of data back and forth. Latency reported below is one-half the round-trip time for 4- byte messages, and bandwidth is message size divided by one-half the round trip time for byte messages. (This number is an artifact of our benchmark program, which uses exponentially increasing message sizes up to 32K. Bandwidth is not sensitive to this value). Although this is a crude measure, the same conclusions hold up under more detailed analysis. The following tables show M-VIA performance (under Linux) and TCP performance under several operating systems, using identical processors and each of three NIC types Loopback (a virtual loopback device, not involving a PCI device), Tulip-based Fast Ethernet (Kingston), and the Packet Engines GNIC-II Gigabit Ethernet NIC. We used PCs with 400 MHz Pentium II processors and Corsair CAS-2 PC-100 memory on ASUS-P2X motherboards. The Tulip and GNIC-II measurements were made with uniprocessor systems connected back-to-back. The loopback measurements were made on a 2-processor system with the same processors and memory, and an ASUS motherboard from the same family. Linux measurements are based on the SMP kernel; Solaris measurements are based on Solaris 7 for x86; Windows NT measurements are for NT 4. We observe that M-VIA performance is significantly better than TCP performance in all cases, and that the relative performance of VIA is better for faster networks. While a comparison to TCP performance is not a definitive assessment, it does demonstrate that M-VIA performance is respectable. M-VIA/Linux TCP/Linux TCP/Solaris TCP/NT Loopback GNIC-II Gigabit Ethernet NA* 82.5 Tulip Fast Ethernet Table 1: Latency (in microseconds). Lower is better. M-VIA/Linux TCP/Linux TCP/Solaris TCP/NT Loopback GNIC-II Gigabit Ethernet NA 14.8 Tulip Fast Ethernet Table 2: Bandwidth (in Megabytes/s). Higher is better. * A GNIC-II driver is not available for Solaris 7/x86.

7 Comparisons to other VIA implementations are difficult because we do not have an apples-to-apples comparison on the same hardware. We mention here some other results to provide perspective, though a direct comparison is not appropriate. An early proof-of-concept VIA implementation from Intel on fast Ethernet hardware had a latency of about 60 µs [Berry 97] more than twice the latency we report here for Tulip-based Ethernet. Berkeley VIA, a partial implementation of VIA oriented towards research, has a latency of 35 µs and bandwidth of 51 MB/s on Myrinet. U-Net on Tulip Fast Ethernet [Welsh88] with 200 MHz Pentium Pro processors has a latency of approximately 25 µs. The M-VIA fast-trap mechanism is essentially the same as that used by U-Net, so that we expect performance to be nearly identical. Giganet reports a VIA latency of 8.5 µs for their NT implementation of VIA with specialized VIA-aware hardware. In all cases, the biggest bottleneck is ultimately the PCI interface. When NGIO and/or Future IO devices become available, we expect latency to fall considerably. 4. Conclusions and Plans M-VIA is the first non-proprietary implementation of VIA. Its modular design facilitates rapid implementation on new network adapters and interoperability, without compromising high performance. An important goal of our work is to provide a reference implementation of VIA that will promote and facilitate the development of high-performance portable VIA applications, and facilitate the development of VIA on other systems. As we have described, M-VIA enables the rapid development of drivers for new NICs, providing portability among NICs. We have additional plans or know of plans to port M-VIA to new processors (the only processor-specific code is related to the fast trap mechanism for software-only drivers), to provide portability among processors. Furthermore, although M-VIA obviously has operating system dependencies, we do not believe there are any fundamental difficulties in porting it to new operating systems. M-VIA development started on FreeBSD before moving to Linux, and a preliminary assessment of the feasibility of an NT port [Buonodonna99] is encouraging.

8 5. References [Basu88] A. Basu, V. Buch, W. Vogels, T. von Eicken. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. Proceedings of the 15 th ACM Symposium on Operating Systems Principles (SOSP), Copper Mountain, Colorado, December [Berry97] F. Berry, E. Deleganes, A. M. Merritt, Intel Corporation. The Virtual Interface Architecture Proof of Concept Performance Results. Available at [Boden95] N. J. Boden, D. Cohen, R. E. Felderman A. E. Kulawik, C. L. Seitz, J. N. Seizovic, W. Su, "Myrinet -- A Gigabit-per-Second Local Area Network," IEEE Micro, Vol. 15, February 1995, pp [Buonadonna98] P. Buonadonna, A. Geweke, D. Culler, An Implementation and Analysis of the Virtual Interface Architecture. Proceedings of SC98, Orlando, Florida, November [Buonadonna99] P. Buonadonna, private communication. April [Clark89] D.D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An Analysis of TCP Processing Overhead. IEEE Communications Magazine, Jun [Dimitrov99] Rossen Dimitrov and Anthony Skjellum. An Efficient MPI Implementation for Virtual Interface (VI) Architecture-Enabled Cluster Computing. Proceedings of the MPI Developers Conference, [Eicken92] T. von Eicken, D. Culler, S. C. Goldstein,and K. Schauser, Active Messages: a Mechanism for Integrated Communication and Computation. Proceedings of the 19th Int'l Symposium on Computer Architecture, Gold Coast, Australia, May [Giganet98] GigaNet Corporation, High Performance clan Host Adapters. Available at [Intel98] Intel Corporation. The Intel VI Architecture Developer s Guide V1.0. September Available at ftp://download.intel.com/design/servers/vi/intel.pdf. [Intel98a] Intel Corporation. The Intel VI Architecture Conformance Suite User s Guide v0.5. December Available at ftp://download.intel.com/design/servers/vi/userguide_v0.5.pdf [Larson98] J. Larson, "The HAL Interconnect PCI Card," [Pakin95] S. Pakin, M. Lauria, A. Chen. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet,,Proceedings of Supercomputing '95, San Diego, California [Pierce94] Paul Pierce and Greg Regnier, The Paragon Message Passing Interface Paper, SHPCC94, 1994 [Tandem95] Tandem Corporation, ServerNet Interconnect Technology, [VIA97] Compaq Computer Corp., Intel Corporation, Microsoft Corporation. Virtual Interface Architecture Specification. Available at [Welsh96] Low-Latency Communication over Fast Ethernet, Matt Welsh, Anindya Basu, Thorsten von Eicken. Proceedings of Euro-Par '96, Lyon, France, August 27-29, 1996.

Virtual Interface Architecture (VIA) Hassan Shojania

Virtual Interface Architecture (VIA) Hassan Shojania Virtual Interface Architecture (VIA) Hassan Shojania Agenda Introduction Software overhead VIA Concepts A VIA sample Design alternatives M-VIA Comparing with InfiniBand Architecture Conclusions & further

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th, 2012 1 Department of Computer Science, Cornell University Papers 2 Active Messages: A Mechanism for Integrated Communication and Control,

More information

Low-Latency Communication over Fast Ethernet

Low-Latency Communication over Fast Ethernet Low-Latency Communication over Fast Ethernet Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca, NY 14853 http://www.cs.cornell.edu/info/projects/u-net

More information

Ethan Kao CS 6410 Oct. 18 th 2011

Ethan Kao CS 6410 Oct. 18 th 2011 Ethan Kao CS 6410 Oct. 18 th 2011 Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings

More information

Design and Implementation of Virtual Memory-Mapped Communication on Myrinet

Design and Implementation of Virtual Memory-Mapped Communication on Myrinet Design and Implementation of Virtual Memory-Mapped Communication on Myrinet Cezary Dubnicki, Angelos Bilas, Kai Li Princeton University Princeton, New Jersey 854 fdubnicki,bilas,lig@cs.princeton.edu James

More information

To provide a faster path between applications

To provide a faster path between applications Cover Feature Evolution of the Virtual Interface Architecture The recent introduction of the VIA standard for cluster or system-area networks has opened the market for commercial user-level network interfaces.

More information

EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale

EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale ABSTRACT Both the Infiniband and the virtual interface architecture (VIA) aim at providing effective cluster

More information

An O/S perspective on networks: Active Messages and U-Net

An O/S perspective on networks: Active Messages and U-Net An O/S perspective on networks: Active Messages and U-Net Theo Jepsen Cornell University 17 October 2013 Theo Jepsen (Cornell University) CS 6410: Advanced Systems 17 October 2013 1 / 30 Brief History

More information

Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2

Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Vijay Moorthy, Matthew G. Jacunski, Manoj Pillai,Peter, P. Ware, Dhabaleswar K. Panda, Thomas W. Page Jr., P. Sadayappan, V. Nagarajan

More information

Virtual Interface Architecture over Myrinet. EEL Computer Architecture Dr. Alan D. George Project Final Report

Virtual Interface Architecture over Myrinet. EEL Computer Architecture Dr. Alan D. George Project Final Report Virtual Interface Architecture over Myrinet EEL5717 - Computer Architecture Dr. Alan D. George Project Final Report Department of Electrical and Computer Engineering University of Florida Edwin Hernandez

More information

RWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster

RWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster RWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster Yutaka Ishikawa Hiroshi Tezuka Atsushi Hori Shinji Sumimoto Toshiyuki Takahashi Francis O Carroll Hiroshi Harada Real

More information

The Lighweight Protocol CLIC on Gigabit Ethernet

The Lighweight Protocol CLIC on Gigabit Ethernet The Lighweight Protocol on Gigabit Ethernet Díaz, A.F.; Ortega; J.; Cañas, A.; Fernández, F.J.; Anguita, M.; Prieto, A. Departamento de Arquitectura y Tecnología de Computadores University of Granada (Spain)

More information

Directed Point: An Efficient Communication Subsystem for Cluster Computing. Abstract

Directed Point: An Efficient Communication Subsystem for Cluster Computing. Abstract Directed Point: An Efficient Communication Subsystem for Cluster Computing Chun-Ming Lee, Anthony Tam, Cho-Li Wang The University of Hong Kong {cmlee+clwang+atctam}@cs.hku.hk Abstract In this paper, we

More information

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Title Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Author(s) Wong, KP; Wang, CL Citation International Conference on Parallel Processing Proceedings, Aizu-Wakamatsu

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet*

Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet* Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet* Hai Jin, Minghu Zhang, and Pengliu Tan Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong

More information

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001 K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the

More information

Lightweight Real-time Network Communication Protocol for Commodity Cluster Systems

Lightweight Real-time Network Communication Protocol for Commodity Cluster Systems Lightweight Real-time Network Communication Protocol for Commodity Cluster Systems Hai Jin, Minghu Zhang, Pengliu Tan, Hanhua Chen, Li Xu Cluster and Grid Computing Lab. Huazhong University of Science

More information

Profile-Based Load Balancing for Heterogeneous Clusters *

Profile-Based Load Balancing for Heterogeneous Clusters * Profile-Based Load Balancing for Heterogeneous Clusters * M. Banikazemi, S. Prabhu, J. Sampathkumar, D. K. Panda, T. W. Page and P. Sadayappan Dept. of Computer and Information Science The Ohio State University

More information

SOVIA: A User-level Sockets Layer Over Virtual Interface Architecture

SOVIA: A User-level Sockets Layer Over Virtual Interface Architecture SOVIA: A User-level Sockets Layer Over Virtual Interface Architecture Jin-Soo Kim, Kangho Kim, and Sung-In Jung Electronics and Telecommunications Research Institute (ETRI) Daejeon 305-350, Korea E-mail:

More information

RTI Performance on Shared Memory and Message Passing Architectures

RTI Performance on Shared Memory and Message Passing Architectures RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu

More information

Can User-Level Protocols Take Advantage of Multi-CPU NICs?

Can User-Level Protocols Take Advantage of Multi-CPU NICs? Can User-Level Protocols Take Advantage of Multi-CPU NICs? Piyush Shivam Dept. of Comp. & Info. Sci. The Ohio State University 2015 Neil Avenue Columbus, OH 43210 shivam@cis.ohio-state.edu Pete Wyckoff

More information

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071

More information

Optimizing TCP in a Cluster of Low-End Linux Machines

Optimizing TCP in a Cluster of Low-End Linux Machines Optimizing TCP in a Cluster of Low-End Linux Machines ABDALLA MAHMOUD, AHMED SAMEH, KHALED HARRAS, TAREK DARWICH Dept. of Computer Science, The American University in Cairo, P.O.Box 2511, Cairo, EGYPT

More information

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles

More information

High performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet

High performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet Title High performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet Author(s) Zhu, W; Lee, D; Wang, CL Citation The 4th International Conference/Exhibition

More information

Motivation CPUs can not keep pace with network

Motivation CPUs can not keep pace with network Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology

More information

Lessons learned from MPI

Lessons learned from MPI Lessons learned from MPI Patrick Geoffray Opinionated Senior Software Architect patrick@myri.com 1 GM design Written by hardware people, pre-date MPI. 2-sided and 1-sided operations: All asynchronous.

More information

Virtualization, Xen and Denali

Virtualization, Xen and Denali Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two

More information

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved C5 Micro-Kernel: Real-Time Services for Embedded and Linux Systems Copyright 2003- Jaluna SA. All rights reserved. JL/TR-03-31.0.1 1 Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview

More information

Parallel Computing Trends: from MPPs to NoWs

Parallel Computing Trends: from MPPs to NoWs Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell

More information

An Extensible Message-Oriented Offload Model for High-Performance Applications

An Extensible Message-Oriented Offload Model for High-Performance Applications An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,

More information

Seekable Sockets: A Mechanism to Reduce Copy Overheads in TCP-based Messaging

Seekable Sockets: A Mechanism to Reduce Copy Overheads in TCP-based Messaging Seekable Sockets: A Mechanism to Reduce Copy Overheads in TCP-based Messaging Chase Douglas and Vijay S. Pai Purdue University West Lafayette, IN 47907 {cndougla, vpai}@purdue.edu Abstract This paper extends

More information

Building MPI for Multi-Programming Systems using Implicit Information

Building MPI for Multi-Programming Systems using Implicit Information Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley

More information

Eliminating the Protocol Stack for Socket based Communication in Shared Memory Interconnects

Eliminating the Protocol Stack for Socket based Communication in Shared Memory Interconnects Eliminating the Protocol Stack for Socket based Communication in Shared Memory Interconnects Stein Jørgen Ryan and Haakon Bryhni Department of Informatics, University of Oslo PO Box 1080, Blindern, N-0316

More information

Security versus Performance Tradeoffs in RPC Implementations for Safe Language Systems

Security versus Performance Tradeoffs in RPC Implementations for Safe Language Systems Security versus Performance Tradeoffs in RPC Implementations for Safe Language Systems Chi-Chao Chang, Grzegorz Czajkowski, Chris Hawblitzel, Deyu Hu, and Thorsten von Eicken Department of Computer Science

More information

Infiniband Fast Interconnect

Infiniband Fast Interconnect Infiniband Fast Interconnect Yuan Liu Institute of Information and Mathematical Sciences Massey University May 2009 Abstract Infiniband is the new generation fast interconnect provides bandwidths both

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

Loaded: Server Load Balancing for IPv6

Loaded: Server Load Balancing for IPv6 Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,

More information

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster

LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,

More information

Operating System Architecture. CS3026 Operating Systems Lecture 03

Operating System Architecture. CS3026 Operating Systems Lecture 03 Operating System Architecture CS3026 Operating Systems Lecture 03 The Role of an Operating System Service provider Provide a set of services to system users Resource allocator Exploit the hardware resources

More information

Switch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet

Switch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience Yoshio Tanaka 1, Motohiko Matsuda 1, Makoto Ando 1, Kazuto Kubota and Mitsuhisa Sato 1 Real World Computing Partnership fyoshio,matu,ando,kazuto,msatog@trc.rwcp.or.jp

More information

Performance of the MP_Lite message-passing library on Linux clusters

Performance of the MP_Lite message-passing library on Linux clusters Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008

Agenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008 Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters*

Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Chao-Tung Yang, Chun-Sheng Liao, and Ping-I Chen High-Performance Computing Laboratory Department of Computer

More information

Design issues and performance comparisons in supporting the sockets interface over user-level communication architecture

Design issues and performance comparisons in supporting the sockets interface over user-level communication architecture J Supercomput (2007) 39: 205 226 DOI 10.1007/s11227-007-0109-5 Design issues and performance comparisons in supporting the sockets interface over user-level communication architecture Jae-Wan Jang Jin-Soo

More information

1 Introduction Myrinet grew from the results of two ARPA-sponsored projects. Caltech's Mosaic and the USC Information Sciences Institute (USC/ISI) ATO

1 Introduction Myrinet grew from the results of two ARPA-sponsored projects. Caltech's Mosaic and the USC Information Sciences Institute (USC/ISI) ATO An Overview of Myrinet Ralph Zajac Rochester Institute of Technology Dept. of Computer Engineering EECC 756 Multiple Processor Systems Dr. M. Shaaban 5/18/99 Abstract The connections between the processing

More information

CSE 4/521 Introduction to Operating Systems. Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018

CSE 4/521 Introduction to Operating Systems. Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018 CSE 4/521 Introduction to Operating Systems Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018 Overview Objective: To explore the principles upon which

More information

PCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems

PCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems PCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems Application Note AN-573 By Craig Hackney Introduction A multi-peer system using a standard-based PCI Express multi-port

More information

Under the Hood, Part 1: Implementing Message Passing

Under the Hood, Part 1: Implementing Message Passing Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads

More information

Making TCP Viable as a High Performance Computing Protocol

Making TCP Viable as a High Performance Computing Protocol Making TCP Viable as a High Performance Computing Protocol Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu maccabe@cs.unm.edu

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

LAPI on HPS Evaluating Federation

LAPI on HPS Evaluating Federation LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of

More information

An Evaluation of the DEC Memory Channel Case Studies in Reflective Memory and Cooperative Scheduling

An Evaluation of the DEC Memory Channel Case Studies in Reflective Memory and Cooperative Scheduling An Evaluation of the DEC Memory Channel Case Studies in Reflective Memory and Cooperative Scheduling Andrew Geweke and Frederick Wong University of California, Berkeley {geweke,fredwong}@cs.berkeley.edu

More information

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014

Introduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational

More information

Multicast can be implemented here

Multicast can be implemented here MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu

More information

Network protocols and. network systems INTRODUCTION CHAPTER

Network protocols and. network systems INTRODUCTION CHAPTER CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more

More information

Experience in Offloading Protocol Processing to a Programmable NIC

Experience in Offloading Protocol Processing to a Programmable NIC Experience in Offloading Protocol Processing to a Programmable NIC Arthur B. Maccabe, Wenbin Zhu Computer Science Department The University of New Mexico Albuquerque, NM 87131 Jim Otto, Rolf Riesen Scalable

More information

ELEC 377 Operating Systems. Week 1 Class 2

ELEC 377 Operating Systems. Week 1 Class 2 Operating Systems Week 1 Class 2 Labs vs. Assignments The only work to turn in are the labs. In some of the handouts I refer to the labs as assignments. There are no assignments separate from the labs.

More information

DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience

DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Vijay Velusamy, Anthony Skjellum MPI Software Technology, Inc. Email: {vijay, tony}@mpi-softtech.com Arkady Kanevsky *,

More information

REMOTE SHARED MEMORY OVER SUN FIRE LINK INTERCONNECT

REMOTE SHARED MEMORY OVER SUN FIRE LINK INTERCONNECT REMOTE SHARED MEMORY OVER SUN FIRE LINK INTERCONNECT Ahmad Afsahi Ying Qian Department of Electrical and Computer Engineering Queen s University Kingston, ON, Canada, K7L 3N6 {ahmad, qiany}@ee.queensu.ca

More information

ATM and Fast Ethernet Network Interfaces for User-level Communication

ATM and Fast Ethernet Network Interfaces for User-level Communication and Fast Ethernet Network Interfaces for User-level Communication Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca,

More information

Protocols and Software for Exploiting Myrinet Clusters

Protocols and Software for Exploiting Myrinet Clusters Protocols and Software for Exploiting Myrinet Clusters P. Geoffray 1, C. Pham, L. Prylli 2, B. Tourancheau 3, and R. Westrelin Laboratoire RESAM, Université Lyon 1 1 Myricom Inc., 2 ENS-Lyon, 3 SUN Labs

More information

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster

Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &

More information

Introduction to TCP/IP Offload Engine (TOE)

Introduction to TCP/IP Offload Engine (TOE) Introduction to TCP/IP Offload Engine (TOE) Version 1.0, April 2002 Authored By: Eric Yeh, Hewlett Packard Herman Chao, QLogic Corp. Venu Mannem, Adaptec, Inc. Joe Gervais, Alacritech Bradley Booth, Intel

More information

Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation

Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation Ranjit Noronha and Dhabaleswar K. Panda Dept. of Computer and Information Science The Ohio State University

More information

PM2: High Performance Communication Middleware for Heterogeneous Network Environments

PM2: High Performance Communication Middleware for Heterogeneous Network Environments PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Computer Science. ! Other approaches:! Special systems designed for extensibility

Computer Science. ! Other approaches:! Special systems designed for extensibility Application-Specific Service Technologies for Commodity OSes in Real-Time Environments Richard West and Gabriel Parmer Boston University Boston, MA {richwest,gabep1}@cs.bu.edu Introduction! Leverage commodity

More information

Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects

Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Jiuxing Liu Balasubramanian Chandrasekaran Weikuan Yu Jiesheng Wu Darius Buntinas Sushmitha Kini Peter Wyckoff Dhabaleswar

More information

THE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks

THE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks Thorsten von Eicken Dept of Computer Science tve@cs.cornell.edu Cornell niversity THE -NET SER-LEVEL NETWORK ARCHITECTRE or: it s easy to buy high-speed networks but making them work is another story NoW

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Towards a Portable Cluster Computing Environment Supporting Single System Image

Towards a Portable Cluster Computing Environment Supporting Single System Image Towards a Portable Cluster Computing Environment Supporting Single System Image Tatsuya Asazu y Bernady O. Apduhan z Itsujiro Arita z Department of Artificial Intelligence Kyushu Institute of Technology

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

A TimeSys Perspective on the Linux Preemptible Kernel Version 1.0. White Paper

A TimeSys Perspective on the Linux Preemptible Kernel Version 1.0. White Paper A TimeSys Perspective on the Linux Preemptible Kernel Version 1.0 White Paper A TimeSys Perspective on the Linux Preemptible Kernel A White Paper from TimeSys Corporation Introduction One of the most basic

More information

Developing a Thin and High Performance Implementation of Message Passing Interface 1

Developing a Thin and High Performance Implementation of Message Passing Interface 1 Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department

More information

IT 4504 Section 4.0. Network Architectures. 2008, University of Colombo School of Computing 1

IT 4504 Section 4.0. Network Architectures. 2008, University of Colombo School of Computing 1 IT 4504 Section 4.0 Network Architectures 2008, University of Colombo School of Computing 1 Section 4.1 Introduction to Computer Networks 2008, University of Colombo School of Computing 2 Introduction

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Networking in a Vertically Scaled World

Networking in a Vertically Scaled World Networking in a Vertically Scaled World David S. Miller Red Hat Inc. LinuxTAG, Berlin, 2008 OUTLINE NETWORK PRINCIPLES MICROPROCESSOR HISTORY IMPLICATIONS FOR NETWORKING LINUX KERNEL HORIZONTAL NETWORK

More information

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks

An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu

More information

High-performance message striping over reliable transport protocols

High-performance message striping over reliable transport protocols J Supercomput (2006) 38:261 278 DOI 10.1007/s11227-006-8443-6 High-performance message striping over reliable transport protocols Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson C Science + Business

More information

IO-Lite: A Unified I/O Buffering and Caching System

IO-Lite: A Unified I/O Buffering and Caching System IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Druschel and Willy Zwaenepoel Rice University (Presented by Chuanpeng Li) 2005-4-25 CS458 Presentation 1 IO-Lite Motivation Network

More information

Communication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN

Communication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN Communication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN Thierry Monteil, Jean Marie Garcia, David Gauchard, Olivier Brun LAAS-CNRS 7 avenue du Colonel Roche 3077 Toulouse, France

More information

Motivation to Teach Network Hardware

Motivation to Teach Network Hardware NetFPGA: An Open Platform for Gigabit-rate Network Switching and Routing John W. Lockwood, Nick McKeown Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo JWLockwd@stanford.edu

More information

Analyzing the Receiver Window Modification Scheme of TCP Queues

Analyzing the Receiver Window Modification Scheme of TCP Queues Analyzing the Receiver Window Modification Scheme of TCP Queues Visvasuresh Victor Govindaswamy University of Texas at Arlington Texas, USA victor@uta.edu Gergely Záruba University of Texas at Arlington

More information

Executing Legacy Applications on a Java Operating System

Executing Legacy Applications on a Java Operating System Executing Legacy Applications on a Java Operating System Andreas Gal, Michael Yang, Christian Probst, and Michael Franz University of California, Irvine {gal,mlyang,probst,franz}@uci.edu May 30, 2004 Abstract

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

Distributed Deadlock Detection for. Distributed Process Networks

Distributed Deadlock Detection for. Distributed Process Networks 0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information