A Modular High Performance Implementation of the Virtual Interface Architecture
|
|
- Judith Burns
- 5 years ago
- Views:
Transcription
1 A Modular High Performance Implementation of the Virtual Interface Architecture Patrick Bozeman Bill Saphir National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Laboratory 1. Overview The Virtual Interface Architecture (VIA) is an industry standard for low-latency high-bandwidth interprocess communication over system area networks (SANs). The VIA specification describes a software interface for fully protected user level communication that can be accelerated by relatively inexpensive VIA-aware hardware. We describe M-VIA, a modular, high-performance and freely available implementation of VIA for Linux. M-VIA makes two significant contributions to the state of the art. First, M-VIA s modularity allows it to support many types of network interfaces (NICs), including legacy NICs and newer smart NICs that have special support for VIA. This high degree of portability has not been achieved or attempted by other userspace communication projects. M-VIA s modularity introduces little overhead, so that M-VIA achieves high-performance. Second, M-VIA provides to applications a portable and robust interface, verifiably conforming to the VIA standard including connection management, error detection, error recovery, and precisely defined semantics. These features make it suitable as a reference implementation and as a base for commercial software development. Previous proof-of-concept research projects have demonstrated high performance but have not emphasized robustness in either the interface or the implementation. M-VIA is implemented as a set of loadable kernel modules for Linux and a user level library. It supports so-called VIA doorbells where they are provided by VIA-aware hardware, and implements software doorbells with a fast trap (a trap to privileged mode that does not incur the overhead of a system call) for legacy hardware. Transfer of data occurs directly from an application s address space, with no copy other than what is required by the network interface, and no operating system overhead in the critical path. M- VIA coexists with traditional networking, allowing a single network to be used for both VIA and IP traffic. 2. Overview of Virtual Interface Architecture (VIA) Academic researchers have developed a variety of techniques for performing very low overhead communication on almost any network. Their research has shown that one can avoid the copying and processing overhead associated with TCP, as well as the overhead of a system call, while still providing full protection. Well-known examples are Active Messages [Eicken92], U-Net [Basu96], and Fast Messages [Pakin95]. These projects have demonstrated a proof-of-concept, but they have not been widely adopted, even within the high performance computing community. IP remains the only protocol that is widely available. Virtual Interface Architecture (VIA) is a production oriented, high-performance communication mechanism for system area networks (SANs). Its design was strongly influenced by the academic research on low-overhead communication as well as experience with MPPs [Pierce94]. Like these projects, VIA provides fully protected user-level access to a network interface. Because of its widespread industry support (Intel, Compaq and Microsoft are the three primary promoters of VIA), it is likely that VIA will become widely adopted. Moreover, VIA can be accelerated by relatively inexpensive VIA-aware hardware, and such hardware will more naturally support VIA than competing communication mechanisms. Examples include Giganet [Giganet98], Synfinity [Larson98] and ServerNet-II [Tandem95].
2 The VIA 1.0 specification [VIA97] was finished in December 1997, after feedback from over a hundred industrial and academic contributors, including the authors of this paper. It provides send and receive operations for message passing, as well as remote memory access operations, which allow read/write access to the memory of a remote process without the explicit cooperation of that process. VIA communication is categorized as unreliable, reliable delivery or reliable reception. Implementations may provide one or more of these modes, usually depending on characteristics of the hardware (though software may implement reliable VIA on unreliable hardware). VIA provides protected zero-copy data transfer (where supported by network hardware), without requiring operating system kernel assistance. VIA requires that memory used in communication be registered by the application prior to communication to avoid page faults on transmission or reception of data. Higher-level communication APIs such as the Message Passing Interface (MPI) can be efficiently layered on VIA [Dimitrov99]. While we are primarily interested in scientific computation, VIA has a number of commercial applications in the area of high performance servers Other important commercial drivers for VIA are the forthcoming NGIO and Future IO standards for high performance peripherals. NGIO is expected to rely on VIA as its transport mechanism. The VI Architecture consists of three components. The user-visible component is a library known as VIPL (VI Provider Library) that contains routines for data transfer, connection management, queue management memory registration and error handling. The second component, the VI Kernel Agent, provides necessary kernel services, including connection management and memory registration. The third component, the VI Network Interface (VI NIC), performs the actual data transfer. It is conceptually a piece of hardware, but may be implemented as a combination of hardware and software. The NIC can directly access user memory and provides a doorbell (usually a memory-mapped register) that VIPL uses to notify the NIC that new entries have been placed in VI work queues. To send and receive messages, a user application writes a VIA descriptor in an area of registered memory, and calls a VIPL routine that presses the doorbell to let the NIC know that that the descriptor is available for processing. Of the three major components, only VIPL is specified in detail by the VIA standard, and even this specification is only a recommendation. To enable truly portable applications, Intel wrote the VI Architecture Developer s Guide [Intel98] that specifies the VIA API in much more detail. Intel released an extensive conformance test suite to determine whether VIA implementations are in compliance. VIA applications that use VIPL (as clarified by the Developer s Guide) should be portable between different conforming VIA implementations. The majority of the VIA community supports the adoption of the standard interface specified in the Developer s Guide. 3. The M-VIA Implementation of VIA We have developed a high-performance modular implementation of VIA for the Linux operating system called Modular VIA (M-VIA). M-VIA is implemented as a user-level library (libvipl.a) and at least two loadable kernel modules for Linux. The core module is device-independent and provides the majority of functionality needed by VIA. One or more device-specific modules, called device modules, implement device-specific functionality. A device module is essentially a device driver, and includes the standard device driver code plus M-VIA-specific modifications M-VIA Modular Design A primary design goal of M-VIA is to enable the rapid implementation of VIA for new network interfaces, including legacy dumb NICs as well as newer smart NICs with either special VIA support (e.g. support for VIA doorbells and VIA descriptor processing) or programmable processors. M-VIA achieves this goal through a modular implementation. It provides a complete VIA framework, but allows a device module to
3 replace a subset of VIA functionality in a device-specific way. With no hardware support, we describe a VIA implementation as software-only, and otherwise call it hardware-accelerated. This modular division between core management and device specific operations facilitates the rapid development of support for new devices. In a hardware-accelerated implementation, the device module can register hardware functionality to allow the hardware to take over core functions, such as memory and doorbell management. In particular, VIA doorbells for VIA-aware hardware are usually implemented as memory-mapped registers read and written by user-level code to tell the network interface that new descriptors have been posted. With hardware acceleration, M-VIA requires no memory-to-memory copies to transfer data. In a software-only implementation, it is critical that the doorbell operation have as little overhead as possible. M-VIA uses a fast trap to execute privileged code with minimum overhead. A fast trap incurs significantly less overhead than a system call, which performs additional operations related to scheduling and signal processing. The 38 instructions written in assembly code to implement the fast trap constitute the only processor-specific code in M-VIA (currently the x86 architecture is supported; Alpha support and PowerPC support are planned). In a software-only implementation, data transmission requires a single memory-to-memory copy inside the interrupt handler at the receiver. This copy is unavoidable for protected communication without special hardware support. M-VIA provides wire level interoperability among software-only NICs. This is facilitated by an additional abstraction called a Device Class, which is a framework within the device module for handling devices with similar characteristics. For instance, an EtherRing class can be used for Ethernet devices with a circular queue of buffer descriptors. The majority of Ethernet devices use this as their internal architecture. Of course, wire level interoperability is not restricted to such a class, only facilitated by it. Modularity does not adversely affect performance. Time-sensitive operations, such as the actual transmission of data, are fast-pathed. Specifically, communication between Devices and Device Classes is through macros; rather than through function calls, and VIA doorbell operations for software NICs are implemented with fast traps. M-VIA achieves high bandwidth for software-only NICs by incorporating virtual memory management into the core module, enabling the transfer of data directly from an application s address space, with no additional memory copies other than those required by the network interface. A side benefit of this approach is that communication within an SMP requires only a single memory copy, whereas arbitrary message passing between separate address spaces requires two copies for any mechanism that is implemented purely in user-space. Thus, bandwidth of non-pipelined VIA communication between two processes on an SMP is approximately two times higher than achievable through other mechanisms M-VIA Core Module The M-VIA core module is divided into device independent, reusable, functional components. Connection Manager: Establishes logical point-to-point connections between VIs. Protection Tag Manager: Allocates, deallocates, and validates memory protection tags. Registered Memory Manager: Handles the registration of user communication buffers and descriptor buffers. Completion Queue Manager: Manages the optional completion queues associated with VI work queues, as well as user requests to block on completion. Error Queue Manager: Provides a mechanism for posting asynchronous error by VIA devices and for blocking on errors by asynchronous error handling threads of VI applications. Linux Kernel Extensions: Provides functionality required for efficient implementation, including: condition variables; user to kernel memory remapping; and user address to physical address translation.
4 The core module provides the default functionality for all VIA operations. To perform device specific functions, the framework components listed above call routines registered by specific device modules. For example, the Connection Manager handles the common support issues relating to queuing requests: blocking for connection completion; verifying connection attributes; assigning a unique connection id; etc. However, the Connection Manager calls functions registered by the device module to actually perform the transmission of the request, acceptance, or rejection of a connection to a remote device. Operations that are entirely device specific, such as the creation and destruction of VIs and the transmission of data to and from the wire, are passed directly to the appropriate device. However, the core framework provides some functions to make the design of such operations easier to implement. For example, generic descriptor processing routines are provided for use by software-only devices Device Modules A device module provides the abstraction of a VI NIC. When a device module registers itself with the core module, the device module informs the core module of its capabilities, such as whether it supports VIA directly in hardware, its native MTU size, the maximum number of VIA descriptors that can be queued for transmission, etc. The device module also registers device specific functions to be used by the modular managers from the core module. The developer of the device module has the option of overriding any and all of the default functionality provided by the core module. For example, if a device that provides native VIA hardware support uses its own mechanism for registering memory, it may completely replace the Registered Memory Manager with an implementation of its own Device Classes Many commodity network interfaces can be logically grouped into common categories such as Ethernet, ATM, FDDI, etc. In order to promote wire level interoperability and rapid development through code reuse, device modules can be written using an internal abstraction called a Device Class. M-VIA devices classes are slightly finer-grained than network types, such as the EtherRing category mentioned above. Device Classes enable common routines for a class of network interfaces to be shared by device modules. Such routines include operations such as the construction and interpretation of media-specific VIA headers and mechanisms for enabling VIA to co-exist with traditional networking protocols, i.e. TCP/IP. While Device Classes are not explicitly supported by the device module, the device module interface is designed to facilitate their use. Macros are used for communication between a device-specific code and a device class, and these are integrated into a device module The VI Provider Library (VIPL) M-VIA contains a single VI Provider Library, VIPL, which is interoperable with software-only and hardware-accelerated VIA devices developed within the M-VIA framework. Device modules specify to VIPL whether the VI Provider Library should use ioctl system calls or fast traps to call time-sensitive VI Kernel Agent services. Device modules also specify whether the VIA Doorbell mechanism is supported directly in hardware as a true memory mapped doorbell or should be emulated with a fast trap M-VIA 2 Based on experiences gained with M-VIA 1, we have begun the design of a modified internal organization in M-VIA 2. The modifications affect both the VIPL and the Core Module. M-VIA 2 design documents are available at
5 The original design of M-VIA was based upon early drafts of the Virtual Interface Architecture Specification. Unfortunately, when the VI Architecture 1.0 specification was released, it relaxed the specification in areas relating to hardware interaction, becoming a specification of the user level VIA component only. This change requires devices to be capable of providing custom user-level functionality to operate efficiently. Currently NIC-specific functionality can be substituted in the Kernel Agent only. Two specific examples of this are doorbells and completion queues. In pre-1.0 versions of the VI Architecture specification, doorbell operations used a standardized Doorbell Token format. The Doorbell Token format is no longer specified in VIA 1.0 (including in the Developer s Guide). A similar problem occurred with the introduction of Completion Queues in the VI Architecture Specification 1.0. To be implemented efficiently, Completion Queues require direct communication between the VI NIC and VIPL. However, the mechanism and data structures used to accomplish this are not defined. The modularized VIPL implementation in M-VIA 2 will enable the substitution of device specific functionality at the user level as well as inside the kernel Functionality and Conformance The Intel Virtual Interface Architecture Developer's Guide describes three levels of conformance to the VIPL API: Early Adopter; Functional; and Full conformance. The Intel VI Architecture Conformance Suite [Intel98a] tests an implementation's conformance to the VIPL API. The conformance suite, consisting of over lines of code, performs thousands of individual tests grouped into functional categories: 34 for Early Adopter; 134 for Functional Conformance; and 156 for Full Conformance. Basic VIPL semantic compliance, resource management, proper handling of error conditions, invalid inputs, and network stress tests are included in the conformance suite. M-VIA passes all of the Early Adopter conformance tests on unreliable networks and includes RDMW Write capability. Reliable Delivery and Reliable Reception will be supported for networks that support these. At the Functional Conformance level, M-VIA implements all functionality except peer-to-peer connection management and resizing of completion queues, including synchronous error handling, remote disconnect notification, and Protection Tag support. M-VIA passes 109 of the 134 Functional Conformance tests included in the test suite. The tests that M-VIA does not pass either contain bugs or calls to the peerto-peer connection management routines. The only additional functions missing from the Full Conformance level are the notify routines, which are essentially syntactic sugar. M-VIA uses Posix threads (pthreads) internally for asynchronous error notification, and is pthreadscompatible, enabling the development of multi-threaded user applications. Operations performed within a multi-threaded application on different VIs are inherently thread safe, but an application must currently provide its own explicit locks if multiple threads access a single VI. A fully thread safe VIPL will be part of M-VIA Implementation Status M-VIA 1.0 supports four NIC types: loopback, fast ethernet cards based on the DEC Tulip chip, the Packet Engines GNIC-1 Gigabit Ethernet Card, and the Packet Engines GNIC-II Gigabit Ethernet Card. We have focused on only a small number of interfaces for two reasons. First, we anticipated fine-tuning the internal interfaces, and did not want to redo the work of implementing all the drivers. Second, our primary goal was a complete and robust implementation of VIA.
6 As described in section 3.1.5, we are currently redesigning internal interfaces for VIPL and the VI Kernel Agent, based on experience with the original design, in order to improve support for smart NICs. This redesign will form the basis of M-VIA 2. With M-VIA 2, we expect an explosion of third-party driver development. There are third-party plans for to implement drivers for Giganet, Myrinet, Servernet, Alteon Gigabit Ethernet and several Intel NICs. M-VIA is freely available for download over the Internet at Performance While the primary focus of M-VIA development so far has been functionality, robustness and modularity, it achieves excellent performance as well. We present here some basic performance comparisons to demonstrate this fact, leaving a more detailed analysis for another report. Latency and bandwidth reported below are measured using a simple pingpong benchmark, in which two processes send a buffer of data back and forth. Latency reported below is one-half the round-trip time for 4- byte messages, and bandwidth is message size divided by one-half the round trip time for byte messages. (This number is an artifact of our benchmark program, which uses exponentially increasing message sizes up to 32K. Bandwidth is not sensitive to this value). Although this is a crude measure, the same conclusions hold up under more detailed analysis. The following tables show M-VIA performance (under Linux) and TCP performance under several operating systems, using identical processors and each of three NIC types Loopback (a virtual loopback device, not involving a PCI device), Tulip-based Fast Ethernet (Kingston), and the Packet Engines GNIC-II Gigabit Ethernet NIC. We used PCs with 400 MHz Pentium II processors and Corsair CAS-2 PC-100 memory on ASUS-P2X motherboards. The Tulip and GNIC-II measurements were made with uniprocessor systems connected back-to-back. The loopback measurements were made on a 2-processor system with the same processors and memory, and an ASUS motherboard from the same family. Linux measurements are based on the SMP kernel; Solaris measurements are based on Solaris 7 for x86; Windows NT measurements are for NT 4. We observe that M-VIA performance is significantly better than TCP performance in all cases, and that the relative performance of VIA is better for faster networks. While a comparison to TCP performance is not a definitive assessment, it does demonstrate that M-VIA performance is respectable. M-VIA/Linux TCP/Linux TCP/Solaris TCP/NT Loopback GNIC-II Gigabit Ethernet NA* 82.5 Tulip Fast Ethernet Table 1: Latency (in microseconds). Lower is better. M-VIA/Linux TCP/Linux TCP/Solaris TCP/NT Loopback GNIC-II Gigabit Ethernet NA 14.8 Tulip Fast Ethernet Table 2: Bandwidth (in Megabytes/s). Higher is better. * A GNIC-II driver is not available for Solaris 7/x86.
7 Comparisons to other VIA implementations are difficult because we do not have an apples-to-apples comparison on the same hardware. We mention here some other results to provide perspective, though a direct comparison is not appropriate. An early proof-of-concept VIA implementation from Intel on fast Ethernet hardware had a latency of about 60 µs [Berry 97] more than twice the latency we report here for Tulip-based Ethernet. Berkeley VIA, a partial implementation of VIA oriented towards research, has a latency of 35 µs and bandwidth of 51 MB/s on Myrinet. U-Net on Tulip Fast Ethernet [Welsh88] with 200 MHz Pentium Pro processors has a latency of approximately 25 µs. The M-VIA fast-trap mechanism is essentially the same as that used by U-Net, so that we expect performance to be nearly identical. Giganet reports a VIA latency of 8.5 µs for their NT implementation of VIA with specialized VIA-aware hardware. In all cases, the biggest bottleneck is ultimately the PCI interface. When NGIO and/or Future IO devices become available, we expect latency to fall considerably. 4. Conclusions and Plans M-VIA is the first non-proprietary implementation of VIA. Its modular design facilitates rapid implementation on new network adapters and interoperability, without compromising high performance. An important goal of our work is to provide a reference implementation of VIA that will promote and facilitate the development of high-performance portable VIA applications, and facilitate the development of VIA on other systems. As we have described, M-VIA enables the rapid development of drivers for new NICs, providing portability among NICs. We have additional plans or know of plans to port M-VIA to new processors (the only processor-specific code is related to the fast trap mechanism for software-only drivers), to provide portability among processors. Furthermore, although M-VIA obviously has operating system dependencies, we do not believe there are any fundamental difficulties in porting it to new operating systems. M-VIA development started on FreeBSD before moving to Linux, and a preliminary assessment of the feasibility of an NT port [Buonodonna99] is encouraging.
8 5. References [Basu88] A. Basu, V. Buch, W. Vogels, T. von Eicken. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. Proceedings of the 15 th ACM Symposium on Operating Systems Principles (SOSP), Copper Mountain, Colorado, December [Berry97] F. Berry, E. Deleganes, A. M. Merritt, Intel Corporation. The Virtual Interface Architecture Proof of Concept Performance Results. Available at [Boden95] N. J. Boden, D. Cohen, R. E. Felderman A. E. Kulawik, C. L. Seitz, J. N. Seizovic, W. Su, "Myrinet -- A Gigabit-per-Second Local Area Network," IEEE Micro, Vol. 15, February 1995, pp [Buonadonna98] P. Buonadonna, A. Geweke, D. Culler, An Implementation and Analysis of the Virtual Interface Architecture. Proceedings of SC98, Orlando, Florida, November [Buonadonna99] P. Buonadonna, private communication. April [Clark89] D.D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An Analysis of TCP Processing Overhead. IEEE Communications Magazine, Jun [Dimitrov99] Rossen Dimitrov and Anthony Skjellum. An Efficient MPI Implementation for Virtual Interface (VI) Architecture-Enabled Cluster Computing. Proceedings of the MPI Developers Conference, [Eicken92] T. von Eicken, D. Culler, S. C. Goldstein,and K. Schauser, Active Messages: a Mechanism for Integrated Communication and Computation. Proceedings of the 19th Int'l Symposium on Computer Architecture, Gold Coast, Australia, May [Giganet98] GigaNet Corporation, High Performance clan Host Adapters. Available at [Intel98] Intel Corporation. The Intel VI Architecture Developer s Guide V1.0. September Available at ftp://download.intel.com/design/servers/vi/intel.pdf. [Intel98a] Intel Corporation. The Intel VI Architecture Conformance Suite User s Guide v0.5. December Available at ftp://download.intel.com/design/servers/vi/userguide_v0.5.pdf [Larson98] J. Larson, "The HAL Interconnect PCI Card," [Pakin95] S. Pakin, M. Lauria, A. Chen. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet,,Proceedings of Supercomputing '95, San Diego, California [Pierce94] Paul Pierce and Greg Regnier, The Paragon Message Passing Interface Paper, SHPCC94, 1994 [Tandem95] Tandem Corporation, ServerNet Interconnect Technology, [VIA97] Compaq Computer Corp., Intel Corporation, Microsoft Corporation. Virtual Interface Architecture Specification. Available at [Welsh96] Low-Latency Communication over Fast Ethernet, Matt Welsh, Anindya Basu, Thorsten von Eicken. Proceedings of Euro-Par '96, Lyon, France, August 27-29, 1996.
Virtual Interface Architecture (VIA) Hassan Shojania
Virtual Interface Architecture (VIA) Hassan Shojania Agenda Introduction Software overhead VIA Concepts A VIA sample Design alternatives M-VIA Comparing with InfiniBand Architecture Conclusions & further
More informationThe latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication
The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University
More informationAN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1. October 4 th, Department of Computer Science, Cornell University
AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th, 2012 1 Department of Computer Science, Cornell University Papers 2 Active Messages: A Mechanism for Integrated Communication and Control,
More informationLow-Latency Communication over Fast Ethernet
Low-Latency Communication over Fast Ethernet Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca, NY 14853 http://www.cs.cornell.edu/info/projects/u-net
More informationEthan Kao CS 6410 Oct. 18 th 2011
Ethan Kao CS 6410 Oct. 18 th 2011 Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. In Proceedings
More informationDesign and Implementation of Virtual Memory-Mapped Communication on Myrinet
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet Cezary Dubnicki, Angelos Bilas, Kai Li Princeton University Princeton, New Jersey 854 fdubnicki,bilas,lig@cs.princeton.edu James
More informationTo provide a faster path between applications
Cover Feature Evolution of the Virtual Interface Architecture The recent introduction of the VIA standard for cluster or system-area networks has opened the market for commercial user-level network interfaces.
More informationEXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale
EXPLORING THE PERFORMANCE OF THE MYRINET PC CLUSTER ON LINUX Roberto Innocente Olumide S. Adewale ABSTRACT Both the Infiniband and the virtual interface architecture (VIA) aim at providing effective cluster
More informationAn O/S perspective on networks: Active Messages and U-Net
An O/S perspective on networks: Active Messages and U-Net Theo Jepsen Cornell University 17 October 2013 Theo Jepsen (Cornell University) CS 6410: Advanced Systems 17 October 2013 1 / 30 Brief History
More informationLow-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2
Low-Latency Message Passing on Workstation Clusters using SCRAMNet 1 2 Vijay Moorthy, Matthew G. Jacunski, Manoj Pillai,Peter, P. Ware, Dhabaleswar K. Panda, Thomas W. Page Jr., P. Sadayappan, V. Nagarajan
More informationVirtual Interface Architecture over Myrinet. EEL Computer Architecture Dr. Alan D. George Project Final Report
Virtual Interface Architecture over Myrinet EEL5717 - Computer Architecture Dr. Alan D. George Project Final Report Department of Electrical and Computer Engineering University of Florida Edwin Hernandez
More informationRWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster
RWC PC Cluster II and SCore Cluster System Software High Performance Linux Cluster Yutaka Ishikawa Hiroshi Tezuka Atsushi Hori Shinji Sumimoto Toshiyuki Takahashi Francis O Carroll Hiroshi Harada Real
More informationThe Lighweight Protocol CLIC on Gigabit Ethernet
The Lighweight Protocol on Gigabit Ethernet Díaz, A.F.; Ortega; J.; Cañas, A.; Fernández, F.J.; Anguita, M.; Prieto, A. Departamento de Arquitectura y Tecnología de Computadores University of Granada (Spain)
More informationDirected Point: An Efficient Communication Subsystem for Cluster Computing. Abstract
Directed Point: An Efficient Communication Subsystem for Cluster Computing Chun-Ming Lee, Anthony Tam, Cho-Li Wang The University of Hong Kong {cmlee+clwang+atctam}@cs.hku.hk Abstract In this paper, we
More informationPush-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters
Title Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters Author(s) Wong, KP; Wang, CL Citation International Conference on Parallel Processing Proceedings, Aizu-Wakamatsu
More informationThe Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)
Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationLightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet*
Lightweight Messages: True Zero-Copy Communication for Commodity Gigabit Ethernet* Hai Jin, Minghu Zhang, and Pengliu Tan Cluster and Grid Computing Lab School of Computer Science and Technology Huazhong
More informationUtilizing Linux Kernel Components in K42 K42 Team modified October 2001
K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the
More informationLightweight Real-time Network Communication Protocol for Commodity Cluster Systems
Lightweight Real-time Network Communication Protocol for Commodity Cluster Systems Hai Jin, Minghu Zhang, Pengliu Tan, Hanhua Chen, Li Xu Cluster and Grid Computing Lab. Huazhong University of Science
More informationProfile-Based Load Balancing for Heterogeneous Clusters *
Profile-Based Load Balancing for Heterogeneous Clusters * M. Banikazemi, S. Prabhu, J. Sampathkumar, D. K. Panda, T. W. Page and P. Sadayappan Dept. of Computer and Information Science The Ohio State University
More informationSOVIA: A User-level Sockets Layer Over Virtual Interface Architecture
SOVIA: A User-level Sockets Layer Over Virtual Interface Architecture Jin-Soo Kim, Kangho Kim, and Sung-In Jung Electronics and Telecommunications Research Institute (ETRI) Daejeon 305-350, Korea E-mail:
More informationRTI Performance on Shared Memory and Message Passing Architectures
RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu
More informationCan User-Level Protocols Take Advantage of Multi-CPU NICs?
Can User-Level Protocols Take Advantage of Multi-CPU NICs? Piyush Shivam Dept. of Comp. & Info. Sci. The Ohio State University 2015 Neil Avenue Columbus, OH 43210 shivam@cis.ohio-state.edu Pete Wyckoff
More informationPerformance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture
Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of
More informationAdvanced Computer Networks. End Host Optimization
Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct
More informationA First Implementation of In-Transit Buffers on Myrinet GM Software Λ
A First Implementation of In-Transit Buffers on Myrinet GM Software Λ S. Coll, J. Flich, M. P. Malumbres, P. López, J. Duato and F.J. Mora Universidad Politécnica de Valencia Camino de Vera, 14, 46071
More informationOptimizing TCP in a Cluster of Low-End Linux Machines
Optimizing TCP in a Cluster of Low-End Linux Machines ABDALLA MAHMOUD, AHMED SAMEH, KHALED HARRAS, TAREK DARWICH Dept. of Computer Science, The American University in Cairo, P.O.Box 2511, Cairo, EGYPT
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationHigh performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet
Title High performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet Author(s) Zhu, W; Lee, D; Wang, CL Citation The 4th International Conference/Exhibition
More informationMotivation CPUs can not keep pace with network
Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology
More informationLessons learned from MPI
Lessons learned from MPI Patrick Geoffray Opinionated Senior Software Architect patrick@myri.com 1 GM design Written by hardware people, pre-date MPI. 2-sided and 1-sided operations: All asynchronous.
More informationVirtualization, Xen and Denali
Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two
More informationOutline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved
C5 Micro-Kernel: Real-Time Services for Embedded and Linux Systems Copyright 2003- Jaluna SA. All rights reserved. JL/TR-03-31.0.1 1 Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview
More informationParallel Computing Trends: from MPPs to NoWs
Parallel Computing Trends: from MPPs to NoWs (from Massively Parallel Processors to Networks of Workstations) Fall Research Forum Oct 18th, 1994 Thorsten von Eicken Department of Computer Science Cornell
More informationAn Extensible Message-Oriented Offload Model for High-Performance Applications
An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,
More informationSeekable Sockets: A Mechanism to Reduce Copy Overheads in TCP-based Messaging
Seekable Sockets: A Mechanism to Reduce Copy Overheads in TCP-based Messaging Chase Douglas and Vijay S. Pai Purdue University West Lafayette, IN 47907 {cndougla, vpai}@purdue.edu Abstract This paper extends
More informationBuilding MPI for Multi-Programming Systems using Implicit Information
Building MPI for Multi-Programming Systems using Implicit Information Frederick C. Wong 1, Andrea C. Arpaci-Dusseau 2, and David E. Culler 1 1 Computer Science Division, University of California, Berkeley
More informationEliminating the Protocol Stack for Socket based Communication in Shared Memory Interconnects
Eliminating the Protocol Stack for Socket based Communication in Shared Memory Interconnects Stein Jørgen Ryan and Haakon Bryhni Department of Informatics, University of Oslo PO Box 1080, Blindern, N-0316
More informationSecurity versus Performance Tradeoffs in RPC Implementations for Safe Language Systems
Security versus Performance Tradeoffs in RPC Implementations for Safe Language Systems Chi-Chao Chang, Grzegorz Czajkowski, Chris Hawblitzel, Deyu Hu, and Thorsten von Eicken Department of Computer Science
More informationInfiniband Fast Interconnect
Infiniband Fast Interconnect Yuan Liu Institute of Information and Mathematical Sciences Massey University May 2009 Abstract Infiniband is the new generation fast interconnect provides bandwidths both
More information1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects
Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects
More informationLoaded: Server Load Balancing for IPv6
Loaded: Server Load Balancing for IPv6 Sven Friedrich, Sebastian Krahmer, Lars Schneidenbach, Bettina Schnor Institute of Computer Science University Potsdam Potsdam, Germany fsfried, krahmer, lschneid,
More informationLiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster
LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. W. Jin, S. Sur, L. Chai, and D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,
More informationOperating System Architecture. CS3026 Operating Systems Lecture 03
Operating System Architecture CS3026 Operating Systems Lecture 03 The Role of an Operating System Service provider Provide a set of services to system users Resource allocator Exploit the hardware resources
More informationSwitch. Switch. PU: Pentium Pro 200MHz Memory: 128MB Myricom Myrinet 100Base-T Ethernet
COMPaS: A Pentium Pro PC-based SMP Cluster and its Experience Yoshio Tanaka 1, Motohiko Matsuda 1, Makoto Ando 1, Kazuto Kubota and Mitsuhisa Sato 1 Real World Computing Partnership fyoshio,matu,ando,kazuto,msatog@trc.rwcp.or.jp
More informationPerformance of the MP_Lite message-passing library on Linux clusters
Performance of the MP_Lite message-passing library on Linux clusters Dave Turner, Weiyi Chen and Ricky Kendall Scalable Computing Laboratory, Ames Laboratory, USA Abstract MP_Lite is a light-weight message-passing
More informationChapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup
Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.
More informationMPI History. MPI versions MPI-2 MPICH2
MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention
More informationAgenda. Threads. Single and Multi-threaded Processes. What is Thread. CSCI 444/544 Operating Systems Fall 2008
Agenda Threads CSCI 444/544 Operating Systems Fall 2008 Thread concept Thread vs process Thread implementation - user-level - kernel-level - hybrid Inter-process (inter-thread) communication What is Thread
More informationRoCE vs. iwarp Competitive Analysis
WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6
More informationDesign and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters*
Design and Implementation of a Monitoring and Scheduling System for Multiple Linux PC Clusters* Chao-Tung Yang, Chun-Sheng Liao, and Ping-I Chen High-Performance Computing Laboratory Department of Computer
More informationDesign issues and performance comparisons in supporting the sockets interface over user-level communication architecture
J Supercomput (2007) 39: 205 226 DOI 10.1007/s11227-007-0109-5 Design issues and performance comparisons in supporting the sockets interface over user-level communication architecture Jae-Wan Jang Jin-Soo
More information1 Introduction Myrinet grew from the results of two ARPA-sponsored projects. Caltech's Mosaic and the USC Information Sciences Institute (USC/ISI) ATO
An Overview of Myrinet Ralph Zajac Rochester Institute of Technology Dept. of Computer Engineering EECC 756 Multiple Processor Systems Dr. M. Shaaban 5/18/99 Abstract The connections between the processing
More informationCSE 4/521 Introduction to Operating Systems. Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 29 Windows 7 (History, Design Principles, System Components, Programmer Interface) Summer 2018 Overview Objective: To explore the principles upon which
More informationPCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems
PCI Express System Interconnect Software Architecture for PowerQUICC TM III-based Systems Application Note AN-573 By Craig Hackney Introduction A multi-peer system using a standard-based PCI Express multi-port
More informationUnder the Hood, Part 1: Implementing Message Passing
Lecture 27: Under the Hood, Part 1: Implementing Message Passing Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2017 Today s Theme 2 Message passing model (abstraction) Threads
More informationMaking TCP Viable as a High Performance Computing Protocol
Making TCP Viable as a High Performance Computing Protocol Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu maccabe@cs.unm.edu
More informationQuickSpecs. HP Z 10GbE Dual Port Module. Models
Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or
More informationLAPI on HPS Evaluating Federation
LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of
More informationAn Evaluation of the DEC Memory Channel Case Studies in Reflective Memory and Cooperative Scheduling
An Evaluation of the DEC Memory Channel Case Studies in Reflective Memory and Cooperative Scheduling Andrew Geweke and Frederick Wong University of California, Berkeley {geweke,fredwong}@cs.berkeley.edu
More informationIntroduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014
Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational
More informationMulticast can be implemented here
MPI Collective Operations over IP Multicast? Hsiang Ann Chen, Yvette O. Carrasco, and Amy W. Apon Computer Science and Computer Engineering University of Arkansas Fayetteville, Arkansas, U.S.A fhachen,yochoa,aapong@comp.uark.edu
More informationNetwork protocols and. network systems INTRODUCTION CHAPTER
CHAPTER Network protocols and 2 network systems INTRODUCTION The technical area of telecommunications and networking is a mature area of engineering that has experienced significant contributions for more
More informationExperience in Offloading Protocol Processing to a Programmable NIC
Experience in Offloading Protocol Processing to a Programmable NIC Arthur B. Maccabe, Wenbin Zhu Computer Science Department The University of New Mexico Albuquerque, NM 87131 Jim Otto, Rolf Riesen Scalable
More informationELEC 377 Operating Systems. Week 1 Class 2
Operating Systems Week 1 Class 2 Labs vs. Assignments The only work to turn in are the labs. In some of the handouts I refer to the labs as assignments. There are no assignments separate from the labs.
More informationDAFS Storage for High Performance Computing using MPI-I/O: Design and Experience
DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Vijay Velusamy, Anthony Skjellum MPI Software Technology, Inc. Email: {vijay, tony}@mpi-softtech.com Arkady Kanevsky *,
More informationREMOTE SHARED MEMORY OVER SUN FIRE LINK INTERCONNECT
REMOTE SHARED MEMORY OVER SUN FIRE LINK INTERCONNECT Ahmad Afsahi Ying Qian Department of Electrical and Computer Engineering Queen s University Kingston, ON, Canada, K7L 3N6 {ahmad, qiany}@ee.queensu.ca
More informationATM and Fast Ethernet Network Interfaces for User-level Communication
and Fast Ethernet Network Interfaces for User-level Communication Matt Welsh, Anindya Basu, and Thorsten von Eicken {mdw,basu,tve}@cs.cornell.edu Department of Computer Science Cornell University, Ithaca,
More informationProtocols and Software for Exploiting Myrinet Clusters
Protocols and Software for Exploiting Myrinet Clusters P. Geoffray 1, C. Pham, L. Prylli 2, B. Tourancheau 3, and R. Westrelin Laboratoire RESAM, Université Lyon 1 1 Myricom Inc., 2 ENS-Lyon, 3 SUN Labs
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationIntroduction to TCP/IP Offload Engine (TOE)
Introduction to TCP/IP Offload Engine (TOE) Version 1.0, April 2002 Authored By: Eric Yeh, Hewlett Packard Herman Chao, QLogic Corp. Venu Mannem, Adaptec, Inc. Joe Gervais, Alacritech Bradley Booth, Intel
More informationImplementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation
Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation Ranjit Noronha and Dhabaleswar K. Panda Dept. of Computer and Information Science The Ohio State University
More informationPM2: High Performance Communication Middleware for Heterogeneous Network Environments
PM2: High Performance Communication Middleware for Heterogeneous Network Environments Toshiyuki Takahashi, Shinji Sumimoto, Atsushi Hori, Hiroshi Harada, and Yutaka Ishikawa Real World Computing Partnership,
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationComputer Science. ! Other approaches:! Special systems designed for extensibility
Application-Specific Service Technologies for Commodity OSes in Real-Time Environments Richard West and Gabriel Parmer Boston University Boston, MA {richwest,gabep1}@cs.bu.edu Introduction! Leverage commodity
More informationMicro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects
Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Jiuxing Liu Balasubramanian Chandrasekaran Weikuan Yu Jiesheng Wu Darius Buntinas Sushmitha Kini Peter Wyckoff Dhabaleswar
More informationTHE U-NET USER-LEVEL NETWORK ARCHITECTURE. Joint work with Werner Vogels, Anindya Basu, and Vineet Buch. or: it s easy to buy high-speed networks
Thorsten von Eicken Dept of Computer Science tve@cs.cornell.edu Cornell niversity THE -NET SER-LEVEL NETWORK ARCHITECTRE or: it s easy to buy high-speed networks but making them work is another story NoW
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationTowards a Portable Cluster Computing Environment Supporting Single System Image
Towards a Portable Cluster Computing Environment Supporting Single System Image Tatsuya Asazu y Bernady O. Apduhan z Itsujiro Arita z Department of Artificial Intelligence Kyushu Institute of Technology
More informationInitial Performance Evaluation of the Cray SeaStar Interconnect
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on
More informationCombining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?
Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática
More informationA TimeSys Perspective on the Linux Preemptible Kernel Version 1.0. White Paper
A TimeSys Perspective on the Linux Preemptible Kernel Version 1.0 White Paper A TimeSys Perspective on the Linux Preemptible Kernel A White Paper from TimeSys Corporation Introduction One of the most basic
More informationDeveloping a Thin and High Performance Implementation of Message Passing Interface 1
Developing a Thin and High Performance Implementation of Message Passing Interface 1 Theewara Vorakosit and Putchong Uthayopas Parallel Research Group Computer and Network System Research Laboratory Department
More informationIT 4504 Section 4.0. Network Architectures. 2008, University of Colombo School of Computing 1
IT 4504 Section 4.0 Network Architectures 2008, University of Colombo School of Computing 1 Section 4.1 Introduction to Computer Networks 2008, University of Colombo School of Computing 2 Introduction
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationNetworking in a Vertically Scaled World
Networking in a Vertically Scaled World David S. Miller Red Hat Inc. LinuxTAG, Berlin, 2008 OUTLINE NETWORK PRINCIPLES MICROPROCESSOR HISTORY IMPLICATIONS FOR NETWORKING LINUX KERNEL HORIZONTAL NETWORK
More informationAn Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks
An Empirical Study of Reliable Multicast Protocols over Ethernet Connected Networks Ryan G. Lane Daniels Scott Xin Yuan Department of Computer Science Florida State University Tallahassee, FL 32306 {ryanlane,sdaniels,xyuan}@cs.fsu.edu
More informationHigh-performance message striping over reliable transport protocols
J Supercomput (2006) 38:261 278 DOI 10.1007/s11227-006-8443-6 High-performance message striping over reliable transport protocols Nader Mohamed Jameela Al-Jaroodi Hong Jiang David Swanson C Science + Business
More informationIO-Lite: A Unified I/O Buffering and Caching System
IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Druschel and Willy Zwaenepoel Rice University (Presented by Chuanpeng Li) 2005-4-25 CS458 Presentation 1 IO-Lite Motivation Network
More informationCommunication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN
Communication Kernel for High Speed Networks in the Parallel Environment LANDA-HSN Thierry Monteil, Jean Marie Garcia, David Gauchard, Olivier Brun LAAS-CNRS 7 avenue du Colonel Roche 3077 Toulouse, France
More informationMotivation to Teach Network Hardware
NetFPGA: An Open Platform for Gigabit-rate Network Switching and Routing John W. Lockwood, Nick McKeown Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo JWLockwd@stanford.edu
More informationAnalyzing the Receiver Window Modification Scheme of TCP Queues
Analyzing the Receiver Window Modification Scheme of TCP Queues Visvasuresh Victor Govindaswamy University of Texas at Arlington Texas, USA victor@uta.edu Gergely Záruba University of Texas at Arlington
More informationExecuting Legacy Applications on a Java Operating System
Executing Legacy Applications on a Java Operating System Andreas Gal, Michael Yang, Christian Probst, and Michael Franz University of California, Irvine {gal,mlyang,probst,franz}@uci.edu May 30, 2004 Abstract
More informationIntroduction to Ethernet Latency
Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency
More informationAccelerated Library Framework for Hybrid-x86
Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit
More informationDistributed Deadlock Detection for. Distributed Process Networks
0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance
More informationImplementation and Analysis of Large Receive Offload in a Virtualized System
Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System
More information