An Extensible Message-Oriented Offload Model for High-Performance Applications

Size: px
Start display at page:

Download "An Extensible Message-Oriented Offload Model for High-Performance Applications"

Transcription

1 An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico Abstract In this paper, we present and validate a new model that captures the benefits of protocol offload in the context of high performance computing systems. In contrast to the LAWS model, the extensible message-oriented offload model (EMO) emphasizes communication in terms of messages rather than flows. In contrast to the LogP model, EMO emphasizes the performance of the network protocol rather than the parallel algorithm. The extensible message-oriented offload model allows us to consider benefits associated with the reduction in message latency along with benefits associated with reduction in overhead and improvements to throughput. We show how our model can be mapped to the LAWS and LogP model and we present preliminary results to verify the model. 1 Introduction Network speeds are increasing. Both Ethernet and Infiniband are currently promising 4 Gb/s performance, and Gigabit performance is now Los Alamos Computer Science Institute SC R717H commonplace. Offloading all or portions of communication protocol processing to an intelligent NIC (Network Interface Card) is frequently used to ensure that benefits of these technologies are available to applications. However, determining what portions of a protocol to offload is still more of an art than a science. Furthermore, there are few tools to help protocol designers choose appropriate functionality to offload. Shivam and Chase created the LAWS model to study the benefits and tradeoffs of offloading[2]. However, there are no models that address the specific concerns of high-performance computing. We create a model that explores offloading of commodity protocols for individual messages which allows us to consider offloading performance for message-oriented applications and libraries like MPI. In this paper, we provide an new model, the extensible message-oriented offload model (EMO), that allows us to evaluate and compare the performance of network protocols in a message-oriented offloaded environment. First, we overview two popular performance models, LAWS and LogP. Second, we review the characteristics of high-performance computing and show how the current models do not meet the specific needs of modeling for high-end applications. Third, we introduce EMO which is a lan- 1

2 guage for capturing performance of various offload strategies for message-oriented protocols. We explain a model of overhead and latency using EMO and map EMO onto LAWS. Finally, we present our preliminary results in verifying the model by comparing modeled latencies for interrupt coalescing with actual results. 2.2 LAWS The LAWS model was created to begin to quanitfy the debate over offloading the TCP/IP protocol. LAWS attempts to characterize the benefits of transport offload. It is based on the following four ratios. 2 Previous Models of Communication Performance Lag ratio(α) - the ratio of host processing speed to NIC processing speed There are two performance models we considered before creating one that was specific to the needs of high-performance computing. Application ratio(γ) - the ratio of application processing to communication processing - how much CPU the application needs 2.1 LogP LogP was created as a new model of parallel computation to replace the outdated data-oriented PRAM model. It is based on following four parameters. L - upper bound on latency o - protocol processing overhead g - minimum interval between message sends or message receives P - number of messages LogP is message-oriented. The original model assumes short messages, but several extensions to LogP for large message sizes have been proposed. However, its focus is on the execution of the entire parallel algorithm and not on the performance of the particular network protocol. The LogP model give us the understanding that the overhead and gap of communications must be minimized to increase performance of parallel algorithms. It does not, however, give us any insight into how this may be done. Wire ratio(σ) - the ratio of bandwidth when host is at 1/ Structural ratio(β) - Ratio of overhead for communication with offload to overhead without offload - what processing was eliminated by offload The LAWS model effectively captures the benefits and constraints of protocol processing offload. Furthermore, because the ratios are independent of a particular protocol, LAWS is extensible. When extending LAWS to an applicationlevel library, the application ratio (γ) must reflect the additional overhead associated with moving data and control from the operating system (OS) to the application library, but this is trivial. However, LAWS is stream-oriented and not message oriented. Specifically, it cannot help us to understand how to minimize gap or latency which are primary needs for our model. LAWS is a good model of the behavior of offloading transport protocols. We provide a mapping from our high-performance messageoriented model to the LAWS model in section 3.3 so we may benefit from the understanding that the LAWS model brings to the question of how and when to offload. 2

3 3 Extensible Message-Oriented Offload Model Neither the LAWS model nor the LogP model help us to evaluate methods for offloading network protocols in the high-performance computing environment. LAWS is not message-oriented and so it does not allow us to model either gap or latency. LogP is not specifically oriented to network protocol performance. We needed a new model of communication in high-performance computing. 3.1 Requirements for High-performance Model We wanted to create a simple language for modeling methods of offload in order to understand how they relate to high-end applications. In addition to the ability to model latency, gap and overhead, we had three requirements for our performance model. We wanted the model to extend through all layers of a network protocol stack including message libraries like MPI at the application layer. We wanted to model offload onto a NIC as this was our primary focus. We wanted to model behavior in a message-oriented environment Extensible Extensibility is necessary for our model because network protocols are often layered. Layered above the network protocols are more layers of message-passing API s and languages like MPI and LINDA. We developed our model to extend through the layers of network protocols and message-passing API s. For example, one of the reasons that TCP has not been considered competitive in highperformance computing is that the MPI implementations are not efficient. The MPI implementations over TCP are generally not well integrated into the TCP protocol stack. A zero-copy TCP implementation still requires a copy in application space as the MPI headers are stripped, the MPI message is matched and the MPI data is moved to the appropriate application buffer. A zero-copy implementation of MPI will require a way to strip headers and perform the MPI match at the NIC level. Again, application libraries like LINDA are implemented on top of MPI. The same process will continue through all layers of the communication stack. We want our model to be extensible so we can capture this behavior Offload Offloading parts or all of the processing of a protocol in order to decrease overhead has been commonplace for years. In the commodity markets of Internet serving and file serving, TCP offload engines (TOEs) are becoming more common as they attempt to compete with other networks like FibreChannel. In high-performance computing, Myrinet, VIA, Quadrix and IB all do some or all of their protocol processing on a NIC or on the network itself. Offload is an attractive way to keep overheads on the host low. Unfortunately, all of these networking solutions are either unavailable, not scalable, or very expensive. Our goal in producing this model was to provide a way to explore whether smart offloading of a commodity protocol like IP or TCP could eventually make these protocols competitive in the high-end computing arena. However, we have found this model useful in developing strategies for offloading various protocols. Offloading is the central focus of this model. Like the LAWS model, the goal of this model is to explore the benefits of offloading transport protocol processing. Unlike the LAWS model, we are doing so in the context of message-oriented highperformance applications. 3

4 3.1.3 Message-Oriented We need a performance model that is messageoriented to that we can specifically model and compare methods of offloading that decrease overhead or latency. Because we assume a low loss network with application level flow control, we choose to focus exclusively on the protocol behavior and the sending and receiving host. To this end, we assume the network does not affect either the overhead of a message or its latency. Clearly, the network affects latency, but we are concerned with how the host protocol processing affects latency. The message-oriented nature of the model also provides the structure necessary to model gap in a new way. The message-oriented nature of a model along with the emphasis on the communication patterns on a single host allows us to focus on the benefits of offloading protocol processing specifically as a measure of overhead and gap. 3.2 EMO We wanted a performance model that is not specific to any one protocol, but our choices were informed by our understanding of MPI over TCP over IP. The Extensible Message-oriented Offload model (EMO) Figure 1. The latency and overhead that is necessary to communicate between components must include the movement of data when appropriate. The variables for this model are as follows: # cycles of protocol processing on NIC Rate of CPU on NIC L NH Time to move data and control from NIC to Host OS # cycles of protocol processing on Host Rate of CPU on Host Latency = L_ha Overhead = O_ha Host OS CPU rate = R_h Protocol overhead = C_h Latency = L_nh Overhead = O_nh NIC CPU rate = R_n Protocol overhead = C_n Application Protocol overhead = C_a Latency = L_na Overhead = O_na Figure 1. The Extensible Message-oriented Offload Model L HA Time to move data and control from Host to App L NA Time to move data and control from NIC to App C A # cycles of protocol processing at Application O NH # host cycles to move data and control from NIC to Host OS O HA # host cycles to move data and control from Host OS to App O NA # host cycles necessary to communicate and move data from NIC to Application Extensibility The model allows for extensibility with respect to protocol layers. We hope this model can be useful for researchers working on offloading parts of the MPI library (like MPI MATCH) or parts of the matching mechanisms for any language or API. We constructed the model so that it can grow through levels of protocols. For example, Our model can by extended, or telescoped, to include 4

5 offloading portions of MPI. We simply add C m, L am and O am to the equations for overhead and latency Overhead Gap max min L W L W EMO allows us to explore the fundamental cost of any protocol, its overhead. Overhead occurs at the per-message and per-byte level. Our model allows us to estimate and graphically represent our understanding about overhead for various levels of protocol offload. Overhead is modeled as Overhead O NH O HA C A O NA. However, all methods will only use some of the communication patterns to process the protocol. Traditional overhead, for example, will not use the communication path between the NIC and the application and does no processing at the application. Traditional Overhead O NH Gap O HA Gap is the interarrival time of messages to an application on a receive and the interdeparture time of message from an application on a send. It is a measure of how well-pipelined the network protocol stack is. But gap is also a measure of how well-balanced the system is. If the host processor is processing packets for a receive very quickly, but the NIC cannot keep up, the host processor will starve and the gap will increase. If the host processor is not able process packets quickly enough on a receive, the NIC will starve and the gap will increase. If the network is slow, both the NIC and host will starve. Gap is a measure of how well-balanced the system is. As we minimize gap, we balance the system Latency Latency is modeled as Latency L NH L HA L NA C A. However, all methods will only use some of the communication patterns to process the protocol. Traditional network protocols, for example, will not use the communication path between the NIC and the application and does no processing at the application. Traditional Latency 3.3 Mapping EMO onto LAWS L NH L HA EMO can be mapped directly onto LAWS which is useful in order to provide a context for the model in the larger offload community. Because LAWS concentrates on an arbitrary number of bytes in a specified amount of time and EMO concentrates on an arbitrary amount of time for a specified amount of bytes, we will have to make a few assumption. The parameters that make up the ratios in the LAWS model are below. o - CPU occupancy for communication overhead per unit of bandwidth a - CPU occupancy for the application per unit of bandwidth X - Occupancy scale factor for host processing Y - Occupancy scale factor for NIC processing L W 5

6 p - Portion of communication overhead o offloaded to NIC B - Bandwidth of network path LAWS assumes a fixed amount of time. Let the fixed amount of time be equal to the time to receive a message of length N on a host. We ll call this time T.This allows us to determine the total number of host cycles possible. C t R h T For simplicity let s define an overhead total for EMO. O T O NH O HA C A O NA Now that we have a fixed time T, a fixed number of bytes N, and the total number of host cycles C t, we can map EMO onto the LAWS parameters. The most difficult part of the mapping from EMO to LAWS is the fact that the communication overhead o is constant while the percentage offloaded p is variable. Thus, p is a ratio used to compare two different offload schemes. Our offload schemes are modeled with different values for and to reflect this difference. We use C H to represent the amount of protocol processing done on the NIC for a second offload scheme. We assume that is incremented by C H since this is the assumption of the LAWS model. Changes to the actual amount of protocol processing under various offload schemes are reflected in the LAWS model ratio β. Changes to the amount of protocol processing done under various offload schemes are reflected directly in different values in EMO. o O NH O HA C A O NA a C T o X Y C H p o N B T LAWS derives all of its ratios from these parameters with the exception of β. The structural ratio describes the amount of processing saved by using a particular offload scheme. We can quantify this directly from our model assuming the second offload mechanism is denoted by variables with. β C N O NH O NH C H O HA O HA C A C A O NA O NA Now we have all of the necessary elements to map EMO onto LAWS. This is useful for understanding how the EMO model fits into the larger area of offloading of protocol processing in commodity applications. We created EMO for highend computing so we can explore gap and overhead for a message-oriented applications. 4 Model Verification - Initial Results We created EMO as a language for comparing methods of offload, but it can be considered an analytical model as well. Our initial verification concerned interrupt coalescing. We verified our model for latency of UDP with no interrupt coalescing and with default interrupt coalescing. We measured latencies by creating a ping-pong test between Host A and Host B. Host A remains constant throughout the measurements. Host A is a 933 MHz Pentium III running an unmodified Linux kernel with the Acenic Gigabit Ethernet card set to default values for interrupt coalescing and transmit ratios. Host B is the same 6

7 machine connected to Host A by cross-over fiber. Host B also runs an unmodified version of the Linux kernel. We measured overhead by modifying our pingpong test. Host A continues the ping-pong test, but Host B includes a cycle-soaker that counts the number of cycles that can be completed while communication is in progress. 4.1 Latency In order to verify the model for latency, we measured actual latency and approximated measurements for the various parts of the sum for our equation: Traditional Latency L NH L HA Our model is verified to the extent that the sum of the addends approximates the actual measured latency Application to Application Latency In order to measure the traditional latency, we ran a simple udp echo server in user space on Host B. Host A simply measures ping-pong latency for various size messages. We measured this latency from 1 byte messages through 89 byte messages. We wanted to remain within the jumbo frame size to avoid fragmentation and reassembly or multiple packets, but we wanted to exercise the crossing of page boundaries. The page size for the Linux kernel is 4KB. We measured application to application latency when Host B has default interrupt coalescing parameters set and also when Host B had interrupt coalescing turned off Application to NIC Latency In order to measure application to NIC latency we moved the UDP echo server into the Acenic firmware. This allows us to measure the latency of a message as it travels through Host A, across the wire, and to the NIC on Host B. This latency should not reflect the cost of the interrupt on Host B, the cost of moving through the kernel receive or send paths on Host B, nor the cost of the copy of data into user space on Host B. The UDP echoserver exercises all of the code in the traditional udp receive and send paths in the Acenic firmware with the exception of the DMA to the host so we assume that the application to NIC latency measurement includes the portion of the latency calculation. However, it is important to note that the startup time for the DMA engine on the Acenic cards is approximately 5mus. This will be accounted for in the L NH portion of the calculation Application to Kernel Latency For our initial results, we chose to measure the latency between an application on Host A and the kernel on Host B using the ping utility already provided by the kernel. This was an easy measurement to procure and should give a reasonable approximation of the latency between the application and the kernel. While the ping message does not travel the exact code path as a UDP message in the kernel, it does exercise the same IP path and very similar code at the level above IP. The ICMP or ping message does not perform a copy of data to user space and does not perform a route lookup. Unfortunately for our purposes, the ping utility does perform a checksum so the application to kernel measurements will be linearly dependant on the size of the message. This is a major differences. In the future, we would like to add the echoserver into the kernel so as to be as consistent as possible in our calculations. The application to kernel latency was measured with an unmodified Host A with default interrupt coalescing and with Host B with both default interrupt coalescing and no interrupt coalescing. We expect that the differences between interrupt coalescing and no interrupt coalescing should be present at this level. 7

8 4.1.4 Results 6 Default Interrupts No Interrupts The expectation is that the average latency will be generally smaller when there is no interrupt coalescing. This is shown in our model. Interrupt coalescing can be seen as a move to decrease the overhead effect of the interrupt O NH at the cost of the time that an interrupt will reach the host. This means we expect that L NH is the variable affected. We cannot currently fully verify this result. We have no reasonable way to isolate the processing on the NIC and have not fully measured processing time or overhead in the kernel. However, Figure 2 shows that generally latency is slightly lower when Host B disables interrupt coalescing. Moreover, Figure 3 further isolates this phenomenon by approximating the ping-pong measurement without the final move from kernel to application space. If we let X be the latency of all communication on Host A, the wire and the NIC on Host B, then Figure 3 represents a comparison between: X L NH and the interrupt coalescing latency X L NH Avg latency in usec Figure 3. Application to Kernel Latency where X represents the latency of all communication on Host A and the wire. Our original intent was to use the application to NIC measurements to isolate X and. However, Figure 4 shows clearly that our application to NIC measurements are not useful. Because the slope is greater than the slope of the actual latency, it appears that our udp echoserver is touching the data more often than the traditional application echoserver. At the very least, the udp echoserver is clearly touching the data and the speed of the NIC (88MHz) is exacerbating the slowdown. We expect this to be a temporary problem as there is no design issue that precludes returning the data without touching it. This is simply an implementation flaw. 6 Default Interrupts No Interrupts 8 7 Actual ToKernel ToNic 5 6 Avg latency in usec 4 3 Avg latency in usec Figure 2. Application to Application Latency Our actual latency can be represented as X L NH L HA Figure 4. Latency with No Interrupt Coalescing Figure 4 also presents us with another artifact 8

9 dissimilar from the model. From our preliminary discussions of EMO, we assumed that L HA is linear with the size of the message because the copy of data occurs here. However, the application to kernel route is clearly also touching all data. This is presumably because the kernel is checksumming the IP packets and therefore touching the data as well. Clearly, the the udp echoserver on the NIC must be rendered useful and the application to kernel path cannot include the checksum in order for us to fully verify our model. However, the insights provided by the model for interrupt coalescing are verified by the measurements. 4.2 Overhead In order to verify the model for overhead, we measured actual overhead and approximated measurements for the various parts of the sum for our equation: Traditional Overhead O NH O HA Our model is verified to the extent that the sum of the addends approximates the actual measured overhead Application to Application Overhead We measured the amount of cycle-soak work Host B can do without any communication occurring. Then we measured the amount of cycle-soak work Host B can do with standard ping-pong communication of various sized messages occurring between an application on Host A and a udp echoserver on Host B. The difference between these to amounts of work is the overhead associated with the communication. It is the the number of cycles being taken away from calculation. We measured the overhead of application to application communication with default interrupts on Host A and with both default interrupts on Host B and with no interrupts on Host B. We expect that the overhead of application to application communication when Host B is using interrupt coalescing will be lower than when Host B is not using interrupt coalescing Kernel to Application Overhead In order to measure the overhead for kernel to application communication, Host A ran a ping flood on Host B and Host B ran the cycle-soak work calculation. We expect that interrupt coalescing will still make a difference at this level of communication so that Host B with no interrupt coalescing will have higher overhead than Host B with default interrupt coalescing. However, we do not expect the size of the message to make as much of a difference in the communication overhead at this level as it does at the application to application communication level NIC to Application Overhead In order to measure the overhead for application to NIC communication, Host B is run with the modified Acenic firmware with the UDP echoserver at the NIC level. Host A runs the UDP ping-pong test and Host B runs the cycle-soak work calculation. We expect quite low overhead on Host B as there is no host involvement with the communication and therefore no communication overhead Results As expected, there was no communication overhead when the UDP echoserver runs at the NIC level. This verified our modeled expectations. We expected that the overhead for application to application communication would be lower when Host B employed interrupt coalescing. Figure 5 difference is negligible at best, but this reflects general results regarding interrupt coalescing and its efficacy in lowering overhead[1]. 9

10 Moreover, Figure 6 shows that this effect also follows through the kernel to application communication path as expected since the interrupt is still present in this path. message. These results should bring a more clear understanding of the role of the memory subsystem in EMO. 6e+8 Actual ToKernel 1.6e+8 1.4e+8 Default Interrupts No Interrupts 5e+8 Avg overhead in cycles 1.2e+8 1e+8 8e+7 6e+7 Avg overhead in cycles 4e+8 3e+8 2e+8 1e+8 4e+7 2e+7 Figure 5. Application to Application Overheads Figure 7. Overhead with Default Interrupt Coalescing 1.6e+8 Default Interrupts No Interrupts 5 Conclusions and Future Work Avg overhead in cycles 1.4e+8 1.2e+8 1e+8 8e+7 6e+7 4e+7 2e+7 Figure 6. Kernel to Application Overheads The most interesting result, however, does not verify our assumptions regarding the EMO model. The overhead for application to application latency does not increase with the size of the message as we expected. Figure 7 shows the gap that represents O HA remains constant rather than increasing as the size of the message increases as expected. The overall overhead also remains constant. Clearly, for small messages, the overhead is predominantly the interrupt overhead. Measurements for much larger messages should reveal overheads that begin to slope with the size of the The extensible message-oriented offload model (EMO) allows us to explore the space of network protocol implementation from the application messaging layers through to the NIC on a message by message basis. This new language gives us a fresh understanding of the role of offloading in terms of overhead, latency and gap in high-performance systems. The preliminary work on verification has yielded areas of further research. A more effective means of isolating various parts of the message path must be created before verification can be complete. The next steps are a profiling of the kernel and a reimplementation of the udp echoserver at the NIC and at the kernel. Clearly, the model remains conceptual until all variables can be isolated and all assumptions can be verified. EMO as a language for exploring offload design is already being used. We plan on exploring the model of gap in EMO to bound the resource requirements for NICs or TCP offload engines at 1Gb/s speeds. We plan also to extend EMO to 1

11 include memory management considerations such as caching. References [1] P. Gilfeather and T. Underwood. Fragmentation and high performance ip. In Proc. of the 15th International Parallel and Distributed Processing Symposium, April 21. [2] P. Shivam and J. Chase. On the elusive benefits of protocol offload. In SIGCOMM workshop on Network-I/O Convergence: Experience, Lessons, Implications (NICELI), August

An Extensible Message-Oriented Offload Model for High-Performance Applications

An Extensible Message-Oriented Offload Model for High-Performance Applications An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,

More information

Making TCP Viable as a High Performance Computing Protocol

Making TCP Viable as a High Performance Computing Protocol Making TCP Viable as a High Performance Computing Protocol Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu maccabe@cs.unm.edu

More information

Motivation CPUs can not keep pace with network

Motivation CPUs can not keep pace with network Deferred Segmentation For Wire-Speed Transmission of Large TCP Frames over Standard GbE Networks Bilic Hrvoye (Billy) Igor Chirashnya Yitzhak Birk Zorik Machulsky Technion - Israel Institute of technology

More information

Introduction to TCP/IP Offload Engine (TOE)

Introduction to TCP/IP Offload Engine (TOE) Introduction to TCP/IP Offload Engine (TOE) Version 1.0, April 2002 Authored By: Eric Yeh, Hewlett Packard Herman Chao, QLogic Corp. Venu Mannem, Adaptec, Inc. Joe Gervais, Alacritech Bradley Booth, Intel

More information

Experience in Offloading Protocol Processing to a Programmable NIC

Experience in Offloading Protocol Processing to a Programmable NIC Experience in Offloading Protocol Processing to a Programmable NIC Arthur B. Maccabe, Wenbin Zhu Computer Science Department The University of New Mexico Albuquerque, NM 87131 Jim Otto, Rolf Riesen Scalable

More information

SPLINTERING TCP TO DECREASE SMALL MESSAGE LATENCY IN HIGH-PERFORMANCE COMPUTING BREANNE DUNCAN THESIS

SPLINTERING TCP TO DECREASE SMALL MESSAGE LATENCY IN HIGH-PERFORMANCE COMPUTING BREANNE DUNCAN THESIS SPLINTERING TCP TO DECREASE SMALL MESSAGE LATENCY IN HIGH-PERFORMANCE COMPUTING by BREANNE DUNCAN THESIS Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelors of Science Computer

More information

Outrunning Moore s Law Can IP-SANs close the host-network gap? Jeff Chase Duke University

Outrunning Moore s Law Can IP-SANs close the host-network gap? Jeff Chase Duke University Outrunning Moore s Law Can IP-SANs close the host-network gap? Jeff Chase Duke University But first. This work addresses questions that are important in the industry right now. It is an outgrowth of Trapeze

More information

The NE010 iwarp Adapter

The NE010 iwarp Adapter The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter

More information

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal

More information

Measurement-based Analysis of TCP/IP Processing Requirements

Measurement-based Analysis of TCP/IP Processing Requirements Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ Networking for Data Acquisition Systems Fabrice Le Goff - 14/02/2018 - ISOTDAQ Outline Generalities The OSI Model Ethernet and Local Area Networks IP and Routing TCP, UDP and Transport Efficiency Networking

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

by Brian Hausauer, Chief Architect, NetEffect, Inc

by Brian Hausauer, Chief Architect, NetEffect, Inc iwarp Ethernet: Eliminating Overhead In Data Center Designs Latest extensions to Ethernet virtually eliminate the overhead associated with transport processing, intermediate buffer copies, and application

More information

Multifunction Networking Adapters

Multifunction Networking Adapters Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained

More information

Optimizing Performance: Intel Network Adapters User Guide

Optimizing Performance: Intel Network Adapters User Guide Optimizing Performance: Intel Network Adapters User Guide Network Optimization Types When optimizing network adapter parameters (NIC), the user typically considers one of the following three conditions

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

IsoStack Highly Efficient Network Processing on Dedicated Cores

IsoStack Highly Efficient Network Processing on Dedicated Cores IsoStack Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran, Muli Ben-Yehuda Outline Motivation IsoStack architecture Prototype TCP/IP over 10GE on a single

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication

The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication The latency of user-to-user, kernel-to-kernel and interrupt-to-interrupt level communication John Markus Bjørndalen, Otto J. Anshus, Brian Vinter, Tore Larsen Department of Computer Science University

More information

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor -0- Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor Lambert Schaelicke, Matthew R. Geiger, Curt J. Freeland Department of Computer Science and Engineering University

More information

Initial Evaluation of a User-Level Device Driver Framework

Initial Evaluation of a User-Level Device Driver Framework Initial Evaluation of a User-Level Device Driver Framework Stefan Götz Karlsruhe University Germany sgoetz@ira.uka.de Kevin Elphinstone National ICT Australia University of New South Wales kevine@cse.unsw.edu.au

More information

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G 10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures

More information

Performance Evaluation of Myrinet-based Network Router

Performance Evaluation of Myrinet-based Network Router Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation

More information

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007 Working with 1 Gigabit Ethernet 1, The School of Physics and Astronomy, The University of Manchester, Manchester, M13 9PL UK E-mail: R.Hughes-Jones@manchester.ac.uk Stephen Kershaw The School of Physics

More information

LANCOM Techpaper IEEE n Indoor Performance

LANCOM Techpaper IEEE n Indoor Performance Introduction The standard IEEE 802.11n features a number of new mechanisms which significantly increase available bandwidths. The former wireless LAN standards based on 802.11a/g enable physical gross

More information

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,

More information

19: Networking. Networking Hardware. Mark Handley

19: Networking. Networking Hardware. Mark Handley 19: Networking Mark Handley Networking Hardware Lots of different hardware: Modem byte at a time, FDDI, SONET packet at a time ATM (including some DSL) 53-byte cell at a time Reality is that most networking

More information

Distributing Application and OS Functionality to Improve Application Performance

Distributing Application and OS Functionality to Improve Application Performance Distributing Application and OS Functionality to Improve Application Performance Arthur B. Maccabe, William Lawry, Christopher Wilson, Rolf Riesen April 2002 Abstract In this paper we demonstrate that

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Linux Network Tuning Guide for AMD EPYC Processor Based Servers

Linux Network Tuning Guide for AMD EPYC Processor Based Servers Linux Network Tuning Guide for AMD EPYC Processor Application Note Publication # 56224 Revision: 1.00 Issue Date: November 2017 Advanced Micro Devices 2017 Advanced Micro Devices, Inc. All rights reserved.

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware

Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware Solace Message Routers and Cisco Ethernet Switches: Unified Infrastructure for Financial Services Middleware What You Will Learn The goal of zero latency in financial services has caused the creation of

More information

Brent Callaghan Sun Microsystems, Inc. Sun Microsystems, Inc

Brent Callaghan Sun Microsystems, Inc. Sun Microsystems, Inc Brent Callaghan. brent@eng.sun.com Page 1 of 19 A Problem: Data Center Performance CPU 1 Gb Fibre Channel 100 MB/sec Storage Array CPU NFS 1 Gb Ethernet 50 MB/sec (via Gigaswift) NFS Server Page 2 of 19

More information

LAPI on HPS Evaluating Federation

LAPI on HPS Evaluating Federation LAPI on HPS Evaluating Federation Adrian Jackson August 23, 2004 Abstract LAPI is an IBM-specific communication library that performs single-sided operation. This library was well profiled on Phase 1 of

More information

RDMA over Commodity Ethernet at Scale

RDMA over Commodity Ethernet at Scale RDMA over Commodity Ethernet at Scale Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitendra Padhye, Marina Lipshteyn ACM SIGCOMM 2016 August 24 2016 Outline RDMA/RoCEv2 background DSCP-based

More information

Lessons learned from MPI

Lessons learned from MPI Lessons learned from MPI Patrick Geoffray Opinionated Senior Software Architect patrick@myri.com 1 GM design Written by hardware people, pre-date MPI. 2-sided and 1-sided operations: All asynchronous.

More information

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a

More information

The Case for RDMA. Jim Pinkerton RDMA Consortium 5/29/2002

The Case for RDMA. Jim Pinkerton RDMA Consortium 5/29/2002 The Case for RDMA Jim Pinkerton RDMA Consortium 5/29/2002 Agenda What is the problem? CPU utilization and memory BW bottlenecks Offload technology has failed (many times) RDMA is a proven sol n to the

More information

Use of the Internet SCSI (iscsi) protocol

Use of the Internet SCSI (iscsi) protocol A unified networking approach to iscsi storage with Broadcom controllers By Dhiraj Sehgal, Abhijit Aswath, and Srinivas Thodati In environments based on Internet SCSI (iscsi) and 10 Gigabit Ethernet, deploying

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Optimizing TCP Receive Performance

Optimizing TCP Receive Performance Optimizing TCP Receive Performance Aravind Menon and Willy Zwaenepoel School of Computer and Communication Sciences EPFL Abstract The performance of receive side TCP processing has traditionally been dominated

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

Myri-10G Myrinet Converges with Ethernet

Myri-10G Myrinet Converges with Ethernet Myri-10G Myrinet Converges with Ethernet David PeGan VP, Sales dave@myri.com (Substituting for Tom Leinberger) 4 October 2006 Oklahoma Supercomputing Symposium 1 New Directions for Myricom Although Myricom

More information

Virtualization, Xen and Denali

Virtualization, Xen and Denali Virtualization, Xen and Denali Susmit Shannigrahi November 9, 2011 Susmit Shannigrahi () Virtualization, Xen and Denali November 9, 2011 1 / 70 Introduction Virtualization is the technology to allow two

More information

Multimedia Streaming. Mike Zink

Multimedia Streaming. Mike Zink Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=

More information

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by

More information

Storage Protocol Offload for Virtualized Environments Session 301-F

Storage Protocol Offload for Virtualized Environments Session 301-F Storage Protocol Offload for Virtualized Environments Session 301-F Dennis Martin, President August 2016 1 Agenda About Demartek Offloads I/O Virtualization Concepts RDMA Concepts Overlay Networks and

More information

Network Test and Monitoring Tools

Network Test and Monitoring Tools ajgillette.com Technical Note Network Test and Monitoring Tools Author: A.J.Gillette Date: December 6, 2012 Revision: 1.3 Table of Contents Network Test and Monitoring Tools...1 Introduction...3 Link Characterization...4

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

Application Acceleration Beyond Flash Storage

Application Acceleration Beyond Flash Storage Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2. Overview 1. Product description 2. Product features 1. Product description HPE Ethernet 10Gb 2-port 535FLR-T adapter 1 HPE Ethernet 10Gb 2-port 535T adapter The HPE Ethernet 10GBase-T 2-port 535 adapters

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner

RiceNIC. A Reconfigurable Network Interface for Experimental Research and Education. Jeffrey Shafer Scott Rixner RiceNIC A Reconfigurable Network Interface for Experimental Research and Education Jeffrey Shafer Scott Rixner Introduction Networking is critical to modern computer systems Role of the network interface

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

An FPGA-Based Optical IOH Architecture for Embedded System

An FPGA-Based Optical IOH Architecture for Embedded System An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing

More information

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory Management q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not an option

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE Gilles Chekroun Errol Roberts SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

HP Cluster Interconnects: The Next 5 Years

HP Cluster Interconnects: The Next 5 Years HP Cluster Interconnects: The Next 5 Years Michael Krause mkrause@hp.com September 8, 2003 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

Message Passing Architecture in Intra-Cluster Communication

Message Passing Architecture in Intra-Cluster Communication CS213 Message Passing Architecture in Intra-Cluster Communication Xiao Zhang Lamxi Bhuyan @cs.ucr.edu February 8, 2004 UC Riverside Slide 1 CS213 Outline 1 Kernel-based Message Passing

More information

Large Receive Offload implementation in Neterion 10GbE Ethernet driver

Large Receive Offload implementation in Neterion 10GbE Ethernet driver Large Receive Offload implementation in Neterion 10GbE Ethernet driver Leonid Grossman Neterion, Inc. leonid@neterion.com Abstract 1 Introduction The benefits of TSO (Transmit Side Offload) implementation

More information

打造 Linux 下的高性能网络 北京酷锐达信息技术有限公司技术总监史应生.

打造 Linux 下的高性能网络 北京酷锐达信息技术有限公司技术总监史应生. 打造 Linux 下的高性能网络 北京酷锐达信息技术有限公司技术总监史应生 shiys@solutionware.com.cn BY DEFAULT, LINUX NETWORKING NOT TUNED FOR MAX PERFORMANCE, MORE FOR RELIABILITY Trade-off :Low Latency, throughput, determinism Performance

More information

CERN openlab Summer 2006: Networking Overview

CERN openlab Summer 2006: Networking Overview CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,

More information

Improving Cluster Performance

Improving Cluster Performance Improving Cluster Performance Service Offloading Larger clusters may need to have special purpose node(s) to run services to prevent slowdown due to contention (e.g. NFS, DNS, login, compilation) In cluster

More information

Initial Performance Evaluation of the Cray SeaStar Interconnect

Initial Performance Evaluation of the Cray SeaStar Interconnect Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on

More information

Networking interview questions

Networking interview questions Networking interview questions What is LAN? LAN is a computer network that spans a relatively small area. Most LANs are confined to a single building or group of buildings. However, one LAN can be connected

More information

Why Your Application only Uses 10Mbps Even the Link is 1Gbps?

Why Your Application only Uses 10Mbps Even the Link is 1Gbps? Why Your Application only Uses 10Mbps Even the Link is 1Gbps? Contents Introduction Background Information Overview of the Issue Bandwidth-Delay Product Verify Solution How to Tell Round Trip Time (RTT)

More information

Communication Networks ( ) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University. Allon Wagner

Communication Networks ( ) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University. Allon Wagner Communication Networks (0368-3030) / Fall 2013 The Blavatnik School of Computer Science, Tel-Aviv University Allon Wagner Kurose & Ross, Chapter 4 (5 th ed.) Many slides adapted from: J. Kurose & K. Ross

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

INT 1011 TCP Offload Engine (Full Offload)

INT 1011 TCP Offload Engine (Full Offload) INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,

More information

Operating Systems 2010/2011

Operating Systems 2010/2011 Operating Systems 2010/2011 Input/Output Systems part 2 (ch13, ch12) Shudong Chen 1 Recap Discuss the principles of I/O hardware and its complexity Explore the structure of an operating system s I/O subsystem

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001 K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the

More information

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.

More information

Troubleshooting High CPU Caused by the BGP Scanner or BGP Router Process

Troubleshooting High CPU Caused by the BGP Scanner or BGP Router Process Troubleshooting High CPU Caused by the BGP Scanner or BGP Router Process Document ID: 107615 Contents Introduction Before You Begin Conventions Prerequisites Components Used Understanding BGP Processes

More information

Profiling the Performance of TCP/IP on Windows NT

Profiling the Performance of TCP/IP on Windows NT Profiling the Performance of TCP/IP on Windows NT P.Xie, B. Wu, M. Liu, Jim Harris, Chris Scheiman Abstract This paper presents detailed network performance measurements of a prototype implementation of

More information

Accelerating Web Protocols Using RDMA

Accelerating Web Protocols Using RDMA Accelerating Web Protocols Using RDMA Dennis Dalessandro Ohio Supercomputer Center NCA 2007 Who's Responsible for this? Dennis Dalessandro Ohio Supercomputer Center - Springfield dennis@osc.edu Pete Wyckoff

More information

iscsi Technology: A Convergence of Networking and Storage

iscsi Technology: A Convergence of Networking and Storage HP Industry Standard Servers April 2003 iscsi Technology: A Convergence of Networking and Storage technology brief TC030402TB Table of Contents Abstract... 2 Introduction... 2 The Changing Storage Environment...

More information

Connection Handoff Policies for TCP Offload Network Interfaces

Connection Handoff Policies for TCP Offload Network Interfaces Connection Handoff Policies for TCP Offload Network Interfaces Hyong-youb Kim and Scott Rixner Rice University Houston, TX 77005 {hykim, rixner}@rice.edu Abstract This paper presents three policies for

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

RICE UNIVERSITY. High Performance MPI Libraries for Ethernet. Supratik Majumder

RICE UNIVERSITY. High Performance MPI Libraries for Ethernet. Supratik Majumder RICE UNIVERSITY High Performance MPI Libraries for Ethernet by Supratik Majumder A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE Approved, Thesis Committee:

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

RDMA and Hardware Support

RDMA and Hardware Support RDMA and Hardware Support SIGCOMM Topic Preview 2018 Yibo Zhu Microsoft Research 1 The (Traditional) Journey of Data How app developers see the network Under the hood This architecture had been working

More information

Using Switches with a PS Series Group

Using Switches with a PS Series Group Cisco Catalyst 3750 and 2970 Switches Using Switches with a PS Series Group Abstract This Technical Report describes how to use Cisco Catalyst 3750 and 2970 switches with a PS Series group to create a

More information

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits

More information

Introduction to Computer Networks. CS 166: Introduction to Computer Systems Security

Introduction to Computer Networks. CS 166: Introduction to Computer Systems Security Introduction to Computer Networks CS 166: Introduction to Computer Systems Security Network Communication Communication in modern networks is characterized by the following fundamental principles Packet

More information

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education

More information

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc. The Convergence of Storage and Server Virtualization 2007 Solarflare Communications, Inc. About Solarflare Communications Privately-held, fabless semiconductor company. Founded 2001 Top tier investors:

More information

Titan: Fair Packet Scheduling for Commodity Multiqueue NICs. Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017

Titan: Fair Packet Scheduling for Commodity Multiqueue NICs. Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017 Titan: Fair Packet Scheduling for Commodity Multiqueue NICs Brent Stephens, Arjun Singhvi, Aditya Akella, and Mike Swift July 13 th, 2017 Ethernet line-rates are increasing! 2 Servers need: To drive increasing

More information

The Interconnection Structure of. The Internet. EECC694 - Shaaban

The Interconnection Structure of. The Internet. EECC694 - Shaaban The Internet Evolved from the ARPANET (the Advanced Research Projects Agency Network), a project funded by The U.S. Department of Defense (DOD) in 1969. ARPANET's purpose was to provide the U.S. Defense

More information

Scribe Notes -- October 31st, 2017

Scribe Notes -- October 31st, 2017 Scribe Notes -- October 31st, 2017 TCP/IP Protocol Suite Most popular protocol but was designed with fault tolerance in mind, not security. Consequences of this: People realized that errors in transmission

More information

Identifying the Sources of Latency in a Splintered Protocol

Identifying the Sources of Latency in a Splintered Protocol Identifying the Sources of Latency in a Splintered Protocol Wenbin Zhu, Arthur B. Maccabe Computer Science Department The University of New Mexico Albuquerque, NM 87131 Rolf Riesen Scalable Computing Systems

More information

Lecture 3. The Network Layer (cont d) Network Layer 1-1

Lecture 3. The Network Layer (cont d) Network Layer 1-1 Lecture 3 The Network Layer (cont d) Network Layer 1-1 Agenda The Network Layer (cont d) What is inside a router? Internet Protocol (IP) IPv4 fragmentation and addressing IP Address Classes and Subnets

More information

Performance Optimisations for HPC workloads. August 2008 Imed Chihi

Performance Optimisations for HPC workloads. August 2008 Imed Chihi Performance Optimisations for HPC workloads August 2008 Imed Chihi Agenda The computing model The assignment problem CPU sets Priorities Disk IO optimisations gettimeofday() Disabling services Memory management

More information