Technische Universität München. Comparison of Network Interface Controllers for Software Packet Processing

Size: px
Start display at page:

Download "Technische Universität München. Comparison of Network Interface Controllers for Software Packet Processing"

Transcription

1 Technische Universität München Department of Informatics Bachelor s Thesis in Informatics Comparison of Network Interface Controllers for Software Packet Processing Alexander P. Frank

2

3 Technische Universität München Department of Informatics Bachelor s Thesis in Informatics Comparison of Network Interface Controllers for Software Packet Processing Vergleich von Netzwerkkarten zur Paketverarbeitung in Software Author Alexander P. Frank Supervisor Prof. Dr.-Ing. Georg Carle Advisor Paul Emmerich, Sebastian Gallenmüller, and Dominik Scholz Date September 15, 2017 Informatik VIII Chair for Network Architectures and Services

4

5 I confirm that this thesis is my own work and I have documented all sources and material used. Garching b. München, September 11, 2017 Signature

6

7 Abstract To test and benchmark network devices and other networking infrastructure, packet generators provide the means to generate precisely defined traffic patterns. MoonGen is a high-speed packet generator which additionally provides hardware timestamping capabilities. MoonGen is built upon DPDK (Data Plane Development Kit) which provides its own poll mode drivers for the supported network interface controllers (NICs). While DPDK abstracts the hardware to a certain degree, it still offers highly hardware specific features. Despite the DPDK abstraction layer, the fundamental software and hardware architectures of NICs used with DPDK vary to a high degree. Because DPDK was originally developed by Intel, its software architecture strongly resembles the software stack of Intel NICs. Mellanox main products are InfiniBand enabled adapters. Therefore, the software stack used by Mellanox NICs together with DPDK is based on the InfiniBand software stack, even though Ethernet variants of those adapters are used. InfiniBand is an alternative interconnect technology to Ethernet which provides similar data rates with lower latencies but requires specialized hardware. As DPDK provides its own drivers, not all features possible on the hardware level are actually supported by software. While each device reports statistics and other information through DPDK s interface, those values must often be interpreted differently depending on the used NIC. In order to enable truly uniform NIC behavior in MoonGen, additional adjustments are made to resolve inconsistencies in DPDK. There are also differences in terms of hardware functionality, however, the basic performance values and external interfaces are comparable.

8

9 I Contents 1 Introduction Goals of the Thesis Outline MoonGen 3 3 InfiniBand InfiniBand Architecture InfiniBand Layers Remote Direct Memory Access InfiniBand Data Rates InfiniBand Stack Possible Advantages of InfiniBand Vendors Intel Mellanox Hardware General Information Data sheets Byte ordering External Interfaces Network Interface PCI Express NC-SI SMBus General-Purpose I/O Hardware Features Checksum Offloads Timestamping Drivers 23

10 II Contents 6.1 Feature Overview MLX5 Poll Mode Driver Filter Statistics Timestamping I40E Poll Mode Driver Filter Statistics IXGBE Driver Filter Statistics Conclusion Future work Acronyms 41 Bibliography 43

11 III List of Figures 2.1 NIC configuration with MoonGen Basic InfiniBand architecture InfiniBand layers RDMA in contrast to a traditional interconnect InfiniBand software stack series MAU lane to physical lane mapping PTP sync flow HCA operation Control and data flow between MoonGen and ConnectX-4 Lx Control and data flow between MoonGen and XL Control and data flow between MoonGen and X550T

12

13 V List of Tables 3.2 Current InfiniBand data rates Comparison of Ethernet and InfiniBand data rates Intel 10/25/40 GbE Ethernet controllers Mellanox ConnectX series series MAC clock frequencies Tested NICs and their associated drivers Support for CPU architectures on ixgbe, i40e, and mlx5 drivers Other features of ixgbe, i40e, and mlx5 drivers Available statistics on ixgbe, i40e, and mlx5 drivers Available status information on ixgbe, i40e, and mlx5 drivers Support for filtering on ixgbe, i40e, and mlx5 drivers Support for offloading on ixgbe, i40e, and mlx5 drivers Interrupt and processor related features of ixgbe, i40e, and mlx5 drivers Other features of ixgbe, i40e, and mlx5 drivers Matching pattern for TCP traffic over IPv

14

15 1 Chapter 1 Introduction Intel and Mellanox both produce Gigabit capable networking equipment such as network interface controllers and, therefore, compete in high-performance segments. Intel provides an extensive amount of documentation on their controllers, whereas Mellanox has released the Mellanox Adapters Programmer s Reference Manual (PRM) for ConnectX-4 and ConnectX-4 Lx based adapters for the first time in We use this new information source for a comparison of selected Intel and Mellanox network interface controllers. 1.1 Goals of the Thesis This thesis has two main goals. Goal one is to enable MoonGen/libmoon to utilize network interface controllers (NICs) from Mellanox the same way as Intel s. For this purpose, the following tasks need to be solved: Configure the build system in a way, that additional dependencies from Mellanox are handled without the need for a user to modify MoonGen/libmoon resources. Enable hardware filtering on Mellanox NICs. Enable uniform packet counting to make statistics between NICs comparable. The source code used to achieve those goals can be found in the MoonGen 1 and libmoon 2 repositories. The other goal, which is the focus of this paper, is a comparison between modern NICs from Intel and Mellanox especially in the context of the Data Plane Development Kit

16 2 Chapter 1. Introduction (DPDK) 3. This work is not intended as a performance comparison, rather we investigate the features provided by the hardware which are useful for the tasks described above. We investigate to which degree those features, if available in hardware, are also supported by the affiliated drivers. 1.2 Outline The thesis consists of eight chapters. In Chapter 2 we give a brief introduction to the packet generator MoonGen which sets the context for this work. Chapter 3 provides an overview over InfiniBand which is tightly connected to Mellanox NICs. After that Chapter 4 covers Intel and Mellanox and their portfolio in regard to networking. Chapter 5 compares the hardware of the investigated NICs based on interfaces and selected features such as timestamping and available offloads. Chapter 6 then presents the different drivers and their support in DPDK. It also explains some of the actions taken to achieve the inclusion of Mellanox NICs in MoonGen/libmoon. We summarize our findings in Chapter 7. For easier lookup Chapter 8 lists several of the less common acronyms used throughout this thesis. 3

17 3 Chapter 2 MoonGen This whole work is done in the context of MoonGen. Therefore, we will give a short introduction on the subject, additional information can be found on the GitHub page 1 of MoonGen. [1] MoonGen extends libmoon which is a Lua wrapper for the Data Plane Development Kit (DPDK). DPDK interfaces with the NICs by providing its own poll mode drivers. Therefore, there is no overhead for interrupt processing which affects performance. Additionally, DPDK includes several libraries which provide a uniform programming environment, abstracting the specific hardware while still making use of its features. Note that DPDK was created by Intel which remains its main contributor. This is resembled by the structure of DPDK which is based on the standard software stack used by Intel NICs. However, other networking related companies, including Mellanox, also contribute to DPDK and support it with their hardware. [2] Originally MoonGen and libmoon combined were the packet generator MoonGen. Since then libmoon has been split off to be available as a standalone Lua wrapper while Moon- Gen now builds on libmoon by extending its capabilities (see also Figure 2.1). MoonGen and libmoon can be used by writing a userscript using the scripting API. MoonGen provides all functionality of libmoon and extends this by additional functionality used for packet generation. Scripts are written in Lua and are processed by the LuaJIT compiler. The userscript contains a master function which initializes all required NICs. Modern NICs provide multiple receive/transmit queues which hold the received packets and the packets to be transmitted respectively. Each queue can be assigned to a different CPU core, which increases the performance by distributing the necessary processing along those cores. For MoonGen to make use of multiple CPUs and multiple receive/transmit queues, the userscript, originally running as the master task, spawns slave tasks as independent Lua VMs. Each slave task runs an assigned slave function. This approach enables linear scaling along multiple cores and allows data rates higher than 100 Gbit/s. 1

18 HW Software 4 Chapter 2. MoonGen libmoon script MoonGen script config API MoonGen Rate Control Timestamping libmoon Lua Wrapper for DPDK Custom Drivers config API DPDK DPDK libraries Drivers NIC NIC Figure 2.1: NIC configuration with MoonGen. The interface offered by libmoon is a subset of MoonGen s interface, therefore, all libmoon scripts are valid for MoonGen. Based on [1] MoonGen then implements additional functionality to support packet generation and latency measurements. This includes: Hardware timestamping - Some Intel NICs allow to timestamp packets on transmit as well as on receive. This is utilized by MoonGen to achieve timestamp accuracy of below 100 ns. Rate control - When sending packets via an API, this typically places the packet in a memory region accessible by the NIC. The NIC then decides when to fetch those packets. This may result in unintended bursts which are fatal to latency measurements at high transmission rates (> 1 Gbit/s). MoonGen solves this problem by placing a constant load on the NIC sending invalid packets when no real data is to be transmitted. Invalid packets (e.g. incorrect CRC-checksum) are usually immediately dropped by receiving NICs, therefore, they do not have an impact on the packet processing on the receiving NIC. This software based rate control also enables complex traffic patterns such as a Poisson process.

19 5 Chapter 3 InfiniBand This thesis is based on NICs using Ethernet as interconnect. However, as explained later in the brief overview of Mellanox, the InfiniBand technology is the basis of Mellanox hardware portfolio. This reflects on some of the design choices regarding Mellanox NICs as the Ethernet adapters partially reuse the software stack designed for the InfiniBand devices. While the Ethernet standard as defined in IEEE [3] is very popular, InfiniBand is mostly used in fields such as high-performance computing (HPC). This section gives an introduction to InfiniBand and highlights the differences compared to Ethernet. Apart from Ethernet and InfiniBand, there are other interconnect solutions which are not investigated here. Those include Fibre Channel and several proprietary technologies. An example would be the Sunway interconnect which was specially developed for the Chinese super-computer Sunway TaihuLight (number one of the TOP500 list in June 2017) [4]. Intel also provides an interconnect solution called Omni-Path. InfiniBand is maintained by the InfiniBand Trade Association (IBTA) since InfiniBand Architecture Like modern Ethernet, InfiniBand (IB) uses a switch-based fabric. This is realized via switches which can provide several data-paths between the same two nodes. Therefore, it provides point-to-point bidirectional serial links between the end nodes of the fabric. An InfiniBand network typically consists of the following blocks (also see Figure 3.1): Host Channel Adapters (HCA) Target Channel Adapter (TCA) Switches

20 6 Chapter 3. InfiniBand Subnet Managers (SM) Routers Gateway The HCA is the controller or adapter which connects a host system s bus to the Infini- Band network, an example would be any of the ConnectX series adapters from Mellanox. As mentioned earlier the switches provide the links between nodes of the network. The existence of multiple paths between two end nodes allows higher aggregate bandwidth and also adds redundancy in case of failure of a link or switch. A TCA is a specialized version of an HCA which provides only a subset of an HCA s features. The Subnet Manager is a software entity which must be present at least once in every subnet. The software itself can reside within any entity of the subnet. Its main task is the configuration of switches and routers. Each device in the network contains a Subnet Management Agent (SMA). Via the SMA the Subnet Manager communicates with all devices within its subnet. Routers can be used to connect different IB subnets with each other. Lastly, the Gateway can be used to connect networks which use different protocols, e.g. an InfiniBand network with an Ethernet based network. Connecting different protocols is also possible by fitting an end node with one network adapter per protocol. [5], [6] Endnode HCA Endnode HCA & SM Gateway Ethernetbased Network Switch InfiniBand Fabric Router Switch Switch Other IB Subnets Endnode TCA Switch Switch I/O Chassis Controller Endnode TCA Endnode TCA Endnode HCA Endnode HCA I/O Module I/O Module Figure 3.1: Basic InfiniBand Architecture. An InfiniBand based subnet with a router to connect to other IB subnets and a Gateway to connect to an Ethernet based network. Based on [5], [6]

21 3.2. InfiniBand Layers 7 Upper Layers Host Client Transactions Remote Client Transport Layer IBA Operation SAR Messages (QP) IBA Operation SAR Network Layer Network Inter Subnet Routing (IPv6) Packet Relay Network Link Layer Link Encoding MAC Subnet Routing (LID) Flow Control Packet Relay Link Link MAC MAC MAC MAC Link Encoding MAC Physical Layer End Node Switch Router End Node Figure 3.2: InfiniBand can be divided into five layers: Upper, transport, network, link, and physical layer. Based on [6] 3.2 InfiniBand Layers Similar to the OSI model, the InfiniBand standard also defines several Infiniband layers and their interaction (see Figure 3.2). Infiniband s architecture covers layers from the physical connection up to the upper layers which contain any applications (kernel or user space). Within the same subnet, packets are switched and forwarded on the link layer based on their 16 bit local ID (LID). Additionally, the link layer provides QoS, guarantees data integrity, and secures that packets cannot be lost. Between different subnets, IPv6 addresses are utilized. On the transport Layer, each connection has two end points called Queue Pairs (QPs). A QP consists of a send and a receive queue which provide the messaging service and can be directly accessed by the belonging application to avoid any involvement of the operating system. If more connections are required new QPs must be created. [6], [7] 3.3 Remote Direct Memory Access Transferring data between applications on different physical machines via interconnect traditionally involves several copies of the data when going through the network stack.

22 8 Chapter 3. InfiniBand This consumes time and CPU resources alike. To avoid this problem, InfiniBand uses Remote Direct Memory Access (RDMA). This enables zero-copy-transfers so that data is directly transferred between the RNIC (RDMA enabled NIC) and the application. The operating system is not in any way involved in the process (see Figure 3.3). It is important to note that for Ethernet, RDMA over Converged Ethernet (RoCE) can provide the same mechanism. Traditional Interconnect RDMA Zero-Copy Interconnect Buffer Application Buffer Application Buffer Application Buffer Application Buffer Sockets Buffer Sockets Sockets Sockets Buffer Transport Protocol Driver Buffer Transport Protocol Driver Buffer NIC Driver Buffer NIC Driver NIC Driver NIC Driver Buffer NIC Buffer NIC Buffer RNIC Buffer RNIC Figure 3.3: RDMA in contrast to a traditional interconnect. Based on [5] 3.4 InfiniBand Data Rates InfiniBand s data rates continually grew since the first specification in Since 2014 EDR is available. HDR is assumed to be introduced later this year (2017). The different rates can be seen in table 3.2. Physical lanes can be aggregated to one physical link, increasing the throughput. Currently, available aggregations are 4x, 8x, and 12x. Multiple lanes are utilized by byte striping the packet byte stream across all available physical lanes. For 8b/10b encoding byte 0 is transmitted via lane 0, byte 1 via lane 1 and so on. In case of 4x byte 4 would again be transmitted via lane 0. With 64b/66b encoding, a more complex pattern is used to stripe the bytes along all lanes. [8] Gigabit Ethernet standards also define byte striping along all available lanes. Those standards are often used in conjunction with LACP (link aggregation control protocol) which allows aggregating multiple physical links into a logical link. Data is then distributed to the underlying physical links on the frame level. LACP configures devices dynamically and can, therefore, detect link failures during operation. [9] FDR10 is a proprietary protocol by Mellanox which is based on the FDR protocol but is running at the same speed per lane as 40 Gigabit Ethernet. This allows using cables,

23 3.5. InfiniBand Stack 9 Name Abbr. Year Raw Signaling Rate Applied Encoding Effective Data Rate (1x) Single Data Rate SDR Gb/s 8b/10b 2 Gb/s Double Data Rate DDR Gb/s 8b/10b 4 Gb/s Quad Data Rate QDR Gb/s 8b/10b 8 Gb/s Fourteen Data Rate 10 FDR Gb/s 64b/66b 10 Gb/s Fourteen Data Rate FDR Gb/s 64b/66b Gb/s Enhanced Data Rate EDR Gb/s 64b/66b 25 Gb/s High Data Rate HDR 2017* 51.6 Gb/s 64b/66b 50 Gb/s Next Data Rate NDR TBD TBD TBD TBD Table 3.2: Current InfiniBand data rates. HDR is not yet released. Based on [5], [10] connectors, and other hardware designed for Ethernet to be used for an InfiniBand based network. [11] Table 3.3 visualizes Ethernet and InfiniBand interconnect technologies with similar data rates excluding those with a throughput lower than 1 Gb/s. Ethernet standard Throughput Year Equiv. InfiniBand Connection 1000BASE-X 1 Gb/s 1998 SDR 1x at 2 Gb/s (2001) 10GBASE-X 10 Gb/s 2002 FDR10 1x at 10 Gb/s (2011) QDR 1x at 8 Gb/s (2007) SDR 4x at 8 Gb/s (2001) 25GBASE 25 Gb/s 2016 QDR 4x at 32 Gb/s (2007) EDR 1x at 25 Gb/s (2014) 40GBASE-X 40 Gb/s 2010 FDR10 4x at 40 Gb/s (2011) DDR 12x at 48 Gb/s (2005) 100GBASE-X 100 Gb/s 2010 QDR 12x at 96 Gb/s (2007) EDR 4x at 100 Gb/s (2014) FDR10 12x at 120 Gb/s (2011) Table 3.3: Comparison of Ethernet and InfiniBand data rates. Based on [3], [10] 3.5 InfiniBand Stack Figure 3.4 gives an overview of the InfiniBand stack. It is explained in greater detail in the following. InfiniBand can be used in two different ways. When the goal is maximum performance with the lowest possible latency, applications must be specifically written for InfiniBand to make use of all features, e.g., an application could directly interface with the verbs

24 10 Chapter 3. InfiniBand IP App SCSI App Sockets App iscsi App FS App Transparent to underlying interconnect technology Transparent and Standard Application Interfaces IPoIB SRP SDP iser NFS-R Others CM Abstraction SMA PMA CM SA Client SMI GSI QP Redirection Verbs InfiniBand Dependent Kernel Level Components HCA Driver InfiniBand HCA Figure 3.4: The InfiniBand software stack. Associated user-mode libraries are not shown. Based on figure 1 from [12] layer which is described later. On the other hand, if the goal is the ability to use the same IP, SCSI, iscsi, sockets or file system based software for different interconnect technologies, this is also possible. On the user level, applications remain unmodified. They interact with the "Transparent and Standard Application Interfaces" which stands for an appropriate layer (or layers) which are indifferent to the underlying interconnect. For IPoIB this would correspond to all interfaces down to the IP layer of the Linux network stack. The kernel level can be divided into three layers: User-level access modules, mid-layer, and the lowest layer. The user-level access modules (kernel level above the core modules) abstract the Infini- Band architecture. E.g. IPoIB appears as a standard NIC driver to the IP layer. The mid-layer (core modules) provides several services. SMA (Subnet Manager Agent)- introduced in Section 3.1.

25 3.6. Possible Advantages of InfiniBand 11 PMA (Performance Management Agent) - collects performance information from management packets. CM (Communications Manager) - manages connections from clients. SA (Subnet Administrator) Client - enables the client to communicate with the Subnet Administrator. The SA holds the path records. MAD (Management Datagram) Services - allows a client to accesses special queue pairs (QPs). Verbs - a verb is a semantic description of an action which is used to interact with the InfiniBand network. The corresponding APIs are not defined in the InfiniBand architecture. APIs are available from other suppliers, such as the OpenFabrics Alliance. The lowest layer consists of the HCA driver which provides the verbs for its specific device. [12], [7] 3.6 Possible Advantages of InfiniBand Because InfiniBand competes with Ethernet as interconnect, this section presents possible advantages of InfiniBand over Ethernet. Contrary to routers in Ethernet based networks, routers in IB networks are less important. Ethernet based networks are typically divided into relatively small subnets for performance reasons. InfiniBand, however, allows to run up to nodes efficiently, decreasing the need for routers. Another possible advantage of InfiniBand over Ethernet is that an InfiniBand fabric, by default, is a lossless fabric. Packets are usually not dropped on the link layer. This is realized by a credit system which determines for each node how much data it is allowed to send. This avoids congestion and may allow for better use of the available bandwidth. [5] InfiniBand is capable of supporting IP traffic via IP over InfiniBand (IPoIB). IPoIB tunnels IP packets over InfiniBand capable hardware. IPoIB to Linux appears as a kernel level driver and relays the extracted packets to the standard network stack at the IP level. Therefore, applications which use standard Linux network services can also use InfiniBand without any modifications to the application itself. This may lead to some speed up, however, as the application is unaware of running over InfiniBand, it cannot make full use of IB s features. It is advised to not use this technique for time critical applications. [12] InfiniBand is especially suited for tasks that require very low latency like HPC and

26 12 Chapter 3. InfiniBand high-frequency trading (HFT). Latency tests based on RFC measured the RTT for 10 Gb/s Ethernet to be between 5-50 µs, whereas InfiniBand QDR could achieve latencies of below 3 µs. Mellanox website states that InfiniBand FDR achieves latencies as low as 0.7 µs and IB QDR 1.2 µs. [13] 1

27 13 Chapter 4 Vendors There is a wide variety of manufacturers and vendors of networking equipment such as NICs. For this thesis, we decided to focus on cards from Intel and Mellanox. The choice in our context was limited by the requirement that the NIC must be supported by DPDK. Nevertheless, there would have been several alternative choices. These include the Terminator 5 from Chelsio Communications and the NetXtreme-E from Broadcom. Both NICs are available in versions supporting 10 GbE, similar to the NICs we chose for testing. A full list of supported NICs can be found on DPDK s website 1. In the following sections, we present a short overview on Intel and Mellanox. 4.1 Intel Intel s current portfolio for networking related hardware consists of two major segments: Ethernet Products and Fabric Products. The latter is mostly intended for highperformance computing and features the Omni-Path technology. Intel also provides True Scale Fabric products which utilize InfiniBand [14]. The NICs which are the focus of this paper, are part of the Ethernet Products segment and do not support InfiniBand. Intel currently provides 1/10/25/40 GbE network adapters. Table 4.1 shows the evolution of Intel s network interface controller series with a throughput of at least 10 Gb/s. 1

28 14 Chapter 4. Vendors Controller Series Year Speed Ethernet Ports Status Intel N/A N/A 1 End of Life Intel N/A 2 End of Life Intel /11 10 Gb/s 1/2 Launched Intel X Mb/s, 1/10 Gb/s 2 Launched Intel X /16 1/2.5/5/10 Gb/s 1/2 Launched Intel X Gb/s 1/2/4 Launched Intel X /15 1/10 Gb/s 2 Launched Intel XL /15 1/10/40 Gb/s 1/2 Launched Intel XXV /10/25 Gb/s 1/2 Launched Table 4.1: Intel 10/25/40 GbE Ethernet controllers. "End of Life" means that all controllers of this series are no longer supported by Intel. [15] 4.2 Mellanox In 2006 Mellanox Technologies launched the ConnectX architecture which went through several iterations. This can be seen in table 4.2. The most recent network adapter is the ConnectX-6 which was revealed this year (2017). At the time this work was written it was not yet officially purchasable. With this new NIC, Mellanox provides 1/10/20/25/40/50/56/100/200 GbE network adapters (see Table 4.2). Mellanox compared to Intel is a rather small company with around 2900 employees in [16] Mellanox claims that it has a market share of close to 90 percent in the segment of 25 Gb/s or greater adapter market. Partially this can be explained by the fact that this is a relatively new segment and networking hardware with a higher throughput than 10 Gb/s is rarely used outside of high-performance computing environments and in data centers. According to Mellanox in percent of the TOP500 s super-computers were using InfiniBand as interconnect. Mellanox and Intel are the two main suppliers for InfiniBand based hardware. [16] NIC Year Speed IB VPI Speed Ethernet VPI Speed Ethernet EN ConnectX Gb/s N/A N/A ConnectX Gb/s 10 Gb/s N/A ConnectX Gb/s 40 Gb/s 56 Gb/s ConnectX-3 Pro Gb/s 40 Gb/s 56 Gb/s ConnectX Gb/s 100 Gb/s 100 Gb/s ConnectX-4 Lx 2016 N/A N/A 50 Gb/s ConnectX Gb/s 100 Gb/s 100 Gb/s ConnectX Gb/s 200 Gb/s 200 Gb/s Table 4.2: Mellanox ConnectX series. Speed is always the highest possible per port. VPI NICs support InfiniBand and possibly Ethernet. EN versions of a NIC support only Ethernet. [16], [17], [18]

29 15 Chapter 5 Hardware This thesis mainly focuses on a comparison of Intel and Mellanox network controllers. For this purpose three network controllers have been selected. Intel Ethernet Controller X550T Intel Ethernet Controller XL710 Mellanox ConnectX-4 Lx EN This chapter aims to describe the hardware capabilities provided by each controller and to illustrate important differences. 5.1 General Information This section provides information about the documentation which is available for the investigated NICs Data sheets The data sheets for the Intel controllers which are discussed in this paper have been released far earlier than the data sheets from Mellanox. The XL710 data sheet was publicly released during July of 2014 with revision 2.0. The current revision is 2.9 which was released in April 2017 [19]. The X550 s documentation was released to the public on the 27th of October 2015 with revision 1.9. It is currently available in revision 2.1 which was released on May 10, 2016 [20]. Mellanox on the contrary released their Mellanox Adapters Programmer s Reference Manual (PRM) [21] during June 2016 in revision This PRM is applicable to the ConnectX-4 as well as to the ConnectX-4 Lx.

30 16 Chapter 5. Hardware Generally, the documentation provided by Intel is far more detailed than the documentation by Mellanox. Intel provides detailed information for every pin and includes mechanical and electrical specifications. Neither of those can be found in the current Mellanox PRM. However, some of this information about the hardware of the Mellanox card can be found in the user manual for the ConnectX-4 Lx [22] which is available since November 2015 (revision 1.0) and is currently in revision Byte ordering The network byte order is always big endian. Fields of multiple bytes are transferred with the most significant bit first in Ethernet. The Intel X550T does not specify any particular byte order, registers which are transferred via wire tend to be big endian, whereas others tend to be little endian [20]. The Intel XL710 also specifies most registers and structures in little endianness. However, buffers which are received or transmitted and structures containing MAC addresses are stored in the big endian format. Additionally, there are structures which mix big and little endianness, e.g., when storing type-length-value structures on a word level big endianness is used but bytes within words are stored in little endian. Contrary to Intel s cards, the ConnectX-4 Lx uses only big endian byte ordering. 5.2 External Interfaces This section will give a short overview of the external interfaces provided by each NIC. Not explicitly mentioned below, but also available on all three NICs are JTAG and Flash interfaces. [19], [20], [21] Network Interface NICs of the X550 family are available in one or two port versions. Each interface can be operated at 10/5/2.5/1 Gb/s or 100 Mb/s independently. The highest possible throughput is 20 Gb/s for two ports running on their maximal data rate. Our X550T is a two port version. The 710 series is also available in one or two port versions. Internally the 710 series features 4 Ethernet MAC ports which can be configured in 40 Gb/s or in 10 Gb/s mode (see Figure 5.1). In 40 Gb/s mode, MAC ports 0 and 1 can be used with 1/10/40 Gb/s operation while MAC port 2 and 3 are disabled. In 10 Gb/s mode, all four MAC ports can be configured to run with 1/10 Gb/s independently. This gives a theoretical throughput of 80 Gb/s in 40 Gb/s mode. The ConnectX-4 Lx comes in one or two port versions, each port can run at 10/25/40/50/56/100 Gb/s. This gives a maximum throughput of 200 Gb/s.

31 5.2. External Interfaces 17 MAC MAU Group/Lane Selector MAC 0 1/10/40G MAC 1 1/10/40G MAC 2 1/10G MAC 3 1/10G PCS 0 8B/10B or 64B/66B PCS 1 8B/10B or 64B/66B PCS 2 8B/10B or 64B/66B PCS 3 8B/10B or 64B/66B Multi_lane Multi_lane Single_lane Single_lane SERDES SERDES SERDES SERDES SERDES SERDES SERDES SERDES PIN GROUP A PIN GROUP B Figure 5.1: 710 series MAU lane to physical lane mapping. In 40 Gb/s mode MAC port 0 and 1 can use up to four MAU lanes, MAC port 2 and 3 are deactivated. In 10 Gb/s mode each MAC port utilizes one MAU lane. Each physical port consists of four SerDes lanes. Based on Figures 3-4 and 3-6 from [19] PCI Express The PCI Express interface is used as the main host interface on all three NICs. The Intel X550T provides a PCIe 3.0 x4 running at a maximum of 8 GT/s. The used code in PCIe 3.0 is 128b130b, so with 4 lanes, the maximum throughput is 3938 MB/s. This equals Gb/s which is higher than the 20 Gb/s which can be supported by the network interface. The XL710 implements 8 lanes for the same PCIe version. With PCIe 3.0 x8 it also supports 8 GT/s and can achieve burst-rates of 7877 MB/s. The 710 series theoretically supports two 40 GbE connections when using two ports (with four ports each port can operate at a maximum of 10 Gb/s). However, the maximum throughput of the PCIe connection lies at Gb/s which is considerably less than the required 80 Gb/s. In reality, the 710 series datasheet states that the total throughput is limited to 40 Gb/s even when using two 40 GbE connections. The Mellanox ConnectX-4 Lx on the other side implements PCIe 3.0 x16 giving it a maximum throughput of MB/s. This equals Gb/s which is also lower than the 200 Gb/s which could be handled by the network interfaces NC-SI All three NICs also provide a NC-SI interface which can be used to connect a NIC to a Baseboard Management Controller. Via this interface, out-of-band management is possible. The X550T implements the NC-SI version specification, the XL710

32 18 Chapter 5. Hardware supports either version or 1.1.0, this can be checked via the NC-SI version flag. The Mellanox NIC complies with version SMBus The use cases of SMBus are equivalent to that of NC-SI. Out-of-band configuration between the network controller and Baseboard Management Controller. SMBus is fully compatible with I 2 C. All NICs which are being tested here provide a SMBus interface General-Purpose I/O These general-purpose I/O (GPIO) pins can be used for miscellaneous tasks, controlling LEDs, or other hardware and software. For instance, GPIO pins can be used to connect to an IEEE 1588 (see Section 5.3.2) auxiliary device, e.g., to a GPS receiver providing a pulseper-second (PPS) signal which allows for high precision synchronization. Contrary to the ConnectX-4 Lx the Intel NICs also provide dedicated LED output pins, 4 for the X550T and 8 for the XL Hardware Features In this section, selected hardware features will be discussed. The main focus lies on the differences between the three NICs chosen for this thesis. The selection of those features was based on the requirements of MoonGen. Features discussed here are supported by the hardware but not necessarily by the drivers especially DPDK s Poll Mode Drivers Checksum Offloads Checksums are used in different protocols and in different layers of the standard OSI model, e.g., the L3 checksum for IPv4 and the L4 checksum for TCP/UDP. The main purpose of checksums is to verify the integrity of a message by calculating a value based on the content of a sent packet. The receiving party recalculates this value and compares it to the transmitted one. If both values match each other chances are high that this packet was transmitted intact. Offloading this calculation to the hardware is particularly efficient as packets with invalid checksums can be dropped immediately by the hardware and therefore do not occupy the NIC or the operating system network stack. Checksum offloading can be provided for transmitting as well as receiving packets. In the case of MoonGen, it is also important to be able to disable checksum offloading as MoonGen deliberately sends packets with invalid checksums to achieve accurate user defined packet rates. The capabilities of the NICs regarding receive offloads will

33 5.3. Hardware Features 19 be discussed in greater detail. The X550T does provide three different checksum calculations, namely the fragment checksum, the IPv4 header checksum, and the TCP/UDP checksum. Generally, the NIC reports the success of the calculation, that means that the computed and the read checksum match, to the ERROR field of the corresponding receive descriptor. A descriptor is a software construct which is associated with data and contains information on how the NIC needs to further process this data. Failure of calculating the matching checksum of the IP header results in the IPv4 Checksum Error bit to be set. Similarly, failure of computing the correct UDP/TCP checksum is reported via the TCP/UDP Checksum Error bit. Additionally, the X550T does support the network virtualization technologies VXLAN and NVGRE. Both technologies encapsulate a second packet within the first. This means that there are now inner and outer IP headers and therefore, also checksums. Checksum offloading for both inner and outer IP headers is supported for both protocols as long as the inner and outer headers are IPv4 headers. If the inner packet is an IPv6 packet it will be omitted for the checksum offloading. For NVGRE only the inner packet can contain TCP/UDP headers, its hardware checksum computation is supported. VXLAN does feature an outer UDP packet, however, its checksum is not used and it will not be calculated by hardware. For SCTP packets of the format specified in table 7-26 of [20], CRC32 checksum offloading can be performed. Under special circumstances checksum offloading for fragmented UDP packets is also supported. For the full list of supported receive checksum capabilities, refer to the table 7-25 of [20]. Transmit capabilities are listed in table 7-45 of the documentation. [23], [24] The 710 series supports the same three types of checksums. If the L3 and L4 headers meet the requirements of the NIC, the L3L4P flag in the corresponding receive descriptor is set, indicating that the NIC is performing additional processing. If the computed checksum does not match the checksum provided in the packet this will be reported via the IPE flag for an IPv4 checksum error, the L4E flag for L4 integrity errors, and the EIPE flag for outer IPv4 checksum errors when tunneling. Similarly to the X550T, the 710 series offloads the SCTP CRC and also supports VXLAN and NVGRE tunneling headers. Contrary to the X550T the Teredo tunneling header is also explicitly supported. L4 Checksum offloading for fragmented packets is not possible for IPv6 packets. The full list of supported receive checksum offloading can be found in table 8-15 of [19]. Transmit offloads are mostly identical and are described in section of the NIC s documentation. The information provided by the Mellanox documentation about this topic is sparse. However, it is stated that receive/transmit checksum offloading is supported for IPv4 and also for TCP/UDP regardless whether the underlying header is IPv4 or IPv6. Tunneling regardless of the protocol is preventing any checksum offloading. SCTP is also not supported. In this regard, the Mellanox NIC is lacking in contrast to the Intel NICs.

34 20 Chapter 5. Hardware Timestamping Precise and accurate timestamps are needed for latency and other performance measurements on networks with high throughput. Software timestamps are often inaccurate, as software has no control over when a packet is really leaving the network port. On the other hand, the hardware itself is able to timestamp packets as they are leaving the device without influencing any of the time-related characteristics. In order to make use of such timestamps, the clocks of the sending and the receiving device must be synchronized. This is done via PTP, based on IEEE1588 and 802.1AS. The time synchronization phase is depicted in Figure 5.2. Master T0 Slave T0 + delta T T1 Sync T2 Master to Slave Transmission delay Follow_Up (T1) T4 Delay_Req T3 Slave to Master Transmission delay Delay_Response (T4) T1, T2, T3, T4 are sampled by hardware Figure 5.2: PTP sync flow. The slave adjust time can be calculated as 1 2 [(T 2 T 1) (T 4 T 3)]. This assumes that the transmission delay is the same for both directions. Based on Figure 8-11 from [19] X550 based NICs use an 80 Mhz clock for time syncing which limits the accuracy to 12.5 ns. For the 710 series, the frequency of the MAC clock on a port depends on the configured link speed for that port (See Table 5.1). Higher clock frequency means higher precision, the highest possible precision being 0.8 ns. Link speed MAC clock frequency Sampling clock precision 40 Gb/s 625 MHz ± 0.8 ns 10 Gb/s MHz ± 1.6 ns 1 Gb/s MHz ± 16 ns Table 5.1: 710 series MAC clock frequencies. Based on Table 8-24 from [19] Both Intel based cards use the same relative point in a packet for timestamping, between the last bit of the Ethernet start of frame delimiter and the first bit of the following octet. Timestamping is done as close to the physical interface transceiver as possible.

35 5.3. Hardware Features 21 Therefore, the X550 based controllers do not insert the time stamp into Tx packets. Instead, if the timestamp logic is enabled (controlled via TSYNCTXCTL.EN bit) and the timestamp bit in the packet descriptor is set, the timestamp is saved in the TXSTMPL and TXSTMPH registers. For PTP time sync software is responsible to read these values and append them to a Follow_Up packet. On the Rx side, packets are timestamped if timestamping is enabled via the TSYNCRXCTL.EN bit and the incoming frame matches the message type defined in the register RXMTRL and the field TSYNCRXCTL.TYPE. The timestamps are saved in the RXSTMPL and RXSTMPH registers. Independent from what the timestamps are used for, software should access the registers as they are locked until software accessed them. Otherwise, the hardware is unable to capture further timestamps. [20] The 710 series operates in a similar way. Tx packets are timestamped if the TSYN bit in the transmit context descriptor is set and the queue is enabled for timestamping (TSYNENA flag in the transmit queue context). The timestamps are saved into PRTTSYN_TXTIME_L and PRTTSYN_TXTIME_H registers. These registers are locked and must be unlocked by software access before the next packet for timestamping arrives. Received packets are time stamped if they are recognized as PTP event packets. To recognize UDP packets they must be directed to the correct port. The following ports are possible depending on the UDP_ENA bits: 00b - UDP packets are not recognized 01b - UDP port number is 0x013F 10b - UDP port number is 0x b - UDP port is either 0x013F or 0x0140 Additionally, the message type of the incoming packet must match the message type which is defined in the PRTTSYN_CTL1 register. Each port features four PRTTSYN_RX- TIME[n] registers, timestamps are saved to currently free registers. They must be accessed by software to allow hardware to use them again. [19] According to the associated user manual, the ConnectX-4 Lx does comply with IEEE 1588v2 (which is not mentioned in the PRM) [22]. However, the timestamping process differs from the one implemented by Intel. The NIC features work queues (WQ) which contain a receive work queue (RQ) and a send work queue (SQ). Software posts work requests as work queue entry (WQE) to an RQ or SQ (see Figure 5.3). If not specified otherwise in the Ctrl Segment of the WQE, the NIC posts a report on completion to the completion queue (CQ). This report is called completion queue entry (CQE). On creation of a CQE a 64 bit timestamp is added, split across the lro_timestamp_- value/timestamp_h (higher 32 bits) and lro_timestamp_echo/timestamp_l (lower 32 bits) fields of the CQE. If the lro_timestamp_is_valid bit is set those fields do not hold the normal timestamp, but instead the large receive offload (LRO) timestamp value field

36 22 Chapter 5. Hardware 1. Post WQEs (Work Queue Entries) WQ 2. Pass WQE Ownership to hardware Software 5. Read CQE (Completion Queue Entry) SQ (Send Work Queue) 3. Access WQEs and exectue Hardware 4. Post CQE CQ (completion queue) RQ (Recieve Work Queue) Figure 5.3: HCA operation. Software and the Mellanox NIC access WQ asynchronously. Explicit passing ownership of WQEs prevents inconsistency. Based on [21] and the timestamp echo reply of the last coalesced header. To convert a timestamp to real time it must be divided by the value given in the HCA_CAP.device_frequency field, with the output being a value in microseconds. To obtain a value in relation to the current time the device-clock must be synchronized with the wall-clock. This can be achieved by querying the device-clock via PCI read of Init_segement.internal_timer_h and Init_segement.internal_timer_l. [21] Mellanox documentation does not include information about the precision and accuracy of those timestamps or the internal device-clock. It is also unclear if the time between a packet leaving the port and the time stamping of the CQE is deterministic and how high the absolute time difference is.

37 23 Chapter 6 Drivers The corresponding drivers for the three network interface controllers are as follows: NIC Intel Ethernet Controller X550T Intel Ethernet Controller XL710-Ethernet-Controller Mellanox ConnectX-4 Lx EN Driver ixgbe i40e mlx5 Table 6.1: Tested NICs and their associated drivers These drivers are shipped together with DPDK and provide DPDK callback functions. 6.1 Feature Overview This section gives an overview and a short explanation over most of the features supported by the mlx5, i40e, and ixgbe drivers, as they are stated in the DPDK documentation. The sections dedicated to each driver describe selected features in greater detail. While DPDK supports various processor architectures, the architecture must also be supported by the drivers. Architecture ixgbe i40e mlx5 ARMv ARMv8 yes yes - Power8 - yes yes x86-32 yes yes yes x86-64 yes yes yes Table 6.2: Support for CPU architectures on ixgbe, i40e and mlx5 drivers. Based on table 1.1 from [25]

38 24 Chapter 6. Drivers Table 6.3 lists additional features which are used by MoonGen/libmoon. The Timesync feature is used by MoonGen to timestamp packets in order to offer high precision latency measurements. Mellanox NICs do not offer this capability. Rate limitation in hardware enables setting the time between sent packets at a fine granularity. However, as mentioned in Chapter 2, MoonGen also offers software based rate control. Feature ixgbe i40e mlx5 Explanation Packet type yes yes yes Supports packet type parsing and returns parsing a list of supported types Timesync yes yes - Supports IEEE1588/802.1AS timestamping Rate limitation yes - - Supports Tx rate limitation for a queue (MoonGen implements rate limitation for the i40e driver) Table 6.3: Other features of ixgbe, i40e, and mlx5 drivers. Based on table 1.1 from [25] While the basic statistic fields are the same for all NICs, the values often have to be interpreted differently. MoonGen uses extended statistics and registers of the hardware to enable uniform statistics for better comparability. Their support is listed in Table 6.4. Feature ixgbe i40e mlx5 Explanation Basic stats yes yes yes Support basic statistics such as: ipackets, opackets, ibytes, obytes, imissed, ierrors, oerrors, rx_nombuf Extended stats yes yes - Supports Extended Statistics, changes from driver to driver Stats per Queue yes - yes Supports configuring per-queue stat counter mapping EEPROM dump yes - - Supports getting/setting device EEPROM data Registers dump yes - - Supports retrieving device registers and registering attributes (number of registers and register size) Table 6.4: Available statistics information on ixgbe, i40e, and mlx5 drivers. Based on table 1.1 from [25] Table 6.5 lists the support for status information retrieval. Because the firmware of Mellanox cards is not reported through DPDK, one has to use Mellanox own tools which are installed with OFED (see Section 6.2).

39 6.1. Feature Overview 25 Feature ixgbe i40e mlx5 Explanation Speed capabilities - - yes Report speed (throughput) of the device Link status yes yes yes Report link speed, state (up/down), and duplex mode Link status yes yes yes Supports Link Status Change interrupts event Rx descriptor yes yes yes Supports check the status of a Rx descriptor. status When rx_descriptor_status is used, sta- tus can be Available, Done or Unavailable. When rx_descriptor_done is used, status can be DD bit is set or DD bit is not set Tx descriptor yes yes yes Supports checking the status of a Tx descriptor. status Status can be Full, Done or Unavailable" FW version yes yes - Supports getting device hardware firmware information Table 6.5: Available status information on ixgbe, i40e, and mlx5 drivers. Based on table 1.1 from [25] Programmable hardware filters can be used to match packets which fulfill a set of properties and to perform actions on them, e.g., directing an incoming packet to a receive queue based on its Ethernet type. The DPDK flow API is explained in more detail in Section When promiscuous mode is enabled, a NIC passes all received packets on to the CPU even if they were directed to another device. This functionality can be used for packet sniffing. Receive Slide Scaling (RSS) is used to evenly distribute packets along several queues. Each queue can be assigned to a different CPU to enhance performance on multi-processor systems. Distribution is often based on a hash function applied to selected fields of the l3/l4 headers. [26]

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services

MoonGen. A Scriptable High-Speed Packet Generator. Paul Emmerich. January 31st, 2016 FOSDEM Chair for Network Architectures and Services MoonGen A Scriptable High-Speed Packet Generator Paul Emmerich January 31st, 216 FOSDEM 216 Chair for Network Architectures and Services Department of Informatics Paul Emmerich MoonGen: A Scriptable High-Speed

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

CERN openlab Summer 2006: Networking Overview

CERN openlab Summer 2006: Networking Overview CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,

More information

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD. OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0 INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,

More information

SUSE Linux Enterprise Server (SLES) 12 SP4 Inbox Driver Release Notes SLES 12 SP4

SUSE Linux Enterprise Server (SLES) 12 SP4 Inbox Driver Release Notes SLES 12 SP4 SUSE Linux Enterprise Server (SLES) 12 SP4 Inbox Release Notes SLES 12 SP4 www.mellanox.com Mellanox Technologies NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS RELATED DOCUMENTATION

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Ethernet. High-Performance Ethernet Adapter Cards

Ethernet. High-Performance Ethernet Adapter Cards High-Performance Ethernet Adapter Cards Supporting Virtualization, Overlay Networks, CPU Offloads and RDMA over Converged Ethernet (RoCE), and Enabling Data Center Efficiency and Scalability Ethernet Mellanox

More information

Application Acceleration Beyond Flash Storage

Application Acceleration Beyond Flash Storage Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage

More information

Mellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note. Rev 3.5.0

Mellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note. Rev 3.5.0 Mellanox OFED for FreeBSD for ConnectX-4/ConnectX-4 Lx/ ConnectX-5 Release Note Rev 3.5.0 www.mellanox.com Mellanox Technologies NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS

More information

ETHERNET OVER INFINIBAND

ETHERNET OVER INFINIBAND 14th ANNUAL WORKSHOP 2018 ETHERNET OVER INFINIBAND Evgenii Smirnov and Mikhail Sennikovsky ProfitBricks GmbH April 10, 2018 ETHERNET OVER INFINIBAND: CURRENT SOLUTIONS mlx4_vnic Currently deprecated Requires

More information

OFED Storage Protocols

OFED Storage Protocols OFED Storage Protocols R. Pearson System Fabric Works, Inc. Agenda Why OFED Storage Introduction to OFED Storage Protocols OFED Storage Protocol Update 2 Why OFED Storage 3 Goals of I/O Consolidation Cluster

More information

Containing RDMA and High Performance Computing

Containing RDMA and High Performance Computing Containing RDMA and High Performance Computing Liran Liss ContainerCon 2015 Agenda High Performance Computing (HPC) networking RDMA 101 Containing RDMA Challenges Solution approach RDMA network namespace

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved. Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

Multifunction Networking Adapters

Multifunction Networking Adapters Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained

More information

Performance monitoring in InfiniBand networks

Performance monitoring in InfiniBand networks Performance monitoring in InfiniBand networks Sjur T. Fredriksen Department of Informatics University of Oslo sjurtf@ifi.uio.no May 2016 Abstract InfiniBand has quickly emerged to be the most popular interconnect

More information

Infiniband and RDMA Technology. Doug Ledford

Infiniband and RDMA Technology. Doug Ledford Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based)

PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based) NIC-PCIE-4SFP+-PLU PCI Express x8 Quad Port 10Gigabit Server Adapter (Intel XL710 Based) Key Features Quad-port 10 GbE adapters PCI Express* (PCIe) 3.0, x8 Exceptional Low Power Adapters Network Virtualization

More information

Low latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015

Low latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015 Low latency, high bandwidth communication. Infiniband and RDMA programming Knut Omang Ifi/Oracle 2 Nov, 2015 1 Bandwidth vs latency There is an old network saying: Bandwidth problems can be cured with

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

InfiniBand Linux Operating System Software Access Layer

InfiniBand Linux Operating System Software Access Layer Software Architecture Specification (SAS) Revision Draft 2 Last Print Date: 4/19/2002-9:04 AM Copyright (c) 1996-2002 Intel Corporation. All rights reserved. InfiniBand Linux Operating System Software

More information

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency Mellanox continues its leadership providing InfiniBand Host Channel

More information

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM Alberto Perez, Technical Manager, Test & Integration John Hildin, Director of Network s John Roach, Vice President

More information

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State

More information

InfiniBand OFED Driver for. VMware Infrastructure 3. Installation Guide

InfiniBand OFED Driver for. VMware Infrastructure 3. Installation Guide Mellanox Technologies InfiniBand OFED Driver for VMware Infrastructure 3 Installation Guide Document no. 2820 Mellanox Technologies http://www.mellanox.com InfiniBand OFED Driver for VMware Infrastructure

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide Mellanox Technologies InfiniBand OFED Driver for VMware Virtual Infrastructure (VI) 3.5 Installation Guide Document no. 2820 Mellanox Technologies http://www.mellanox.com InfiniBand OFED Driver for VMware

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2.

QuickSpecs. Overview. HPE Ethernet 10Gb 2-port 535 Adapter. HPE Ethernet 10Gb 2-port 535 Adapter. 1. Product description. 2. Overview 1. Product description 2. Product features 1. Product description HPE Ethernet 10Gb 2-port 535FLR-T adapter 1 HPE Ethernet 10Gb 2-port 535T adapter The HPE Ethernet 10GBase-T 2-port 535 adapters

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

URDMA: RDMA VERBS OVER DPDK

URDMA: RDMA VERBS OVER DPDK 13 th ANNUAL WORKSHOP 2017 URDMA: RDMA VERBS OVER DPDK Patrick MacArthur, Ph.D. Candidate University of New Hampshire March 28, 2017 ACKNOWLEDGEMENTS urdma was initially developed during an internship

More information

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center

More information

Ron Emerick, Oracle Corporation

Ron Emerick, Oracle Corporation PCI Express PRESENTATION Virtualization TITLE GOES HERE Overview Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.

More information

PARAVIRTUAL RDMA DEVICE

PARAVIRTUAL RDMA DEVICE 12th ANNUAL WORKSHOP 2016 PARAVIRTUAL RDMA DEVICE Aditya Sarwade, Adit Ranadive, Jorgen Hansen, Bhavesh Davda, George Zhang, Shelley Gong VMware, Inc. [ April 5th, 2016 ] MOTIVATION User Kernel Socket

More information

OpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.

OpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc. OpenFabrics Interface WG A brief introduction Paul Grun co chair OFI WG Cray, Inc. OFI WG a brief overview and status report 1. Keep everybody on the same page, and 2. An example of a possible model for

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Learning with Purpose

Learning with Purpose Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts

More information

Red Hat Enterprise Linux (RHEL) 7.5-ALT Driver Release Notes

Red Hat Enterprise Linux (RHEL) 7.5-ALT Driver Release Notes Red Hat Enterprise Linux (RHEL) 7.5-ALT Driver Release Notes RHEL 7.5-ALT www.mellanox.com Mellanox Technologies NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT ( PRODUCT(S) ) AND ITS RELATED DOCUMENTATION

More information

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC 2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

Supplement to InfiniBand TM Architecture Specification Volume 1 Release 1.2. Annex A11: RDMA IP CM Service. September 8, 2006

Supplement to InfiniBand TM Architecture Specification Volume 1 Release 1.2. Annex A11: RDMA IP CM Service. September 8, 2006 Supplement to InfiniBand TM Architecture Specification Volume Release. Annex A: RDMA IP CM Service September, 0 Copyright 0 by InfiniBand TM Trade Association. All rights reserved. All trademarks and brands

More information

Creating an agile infrastructure with Virtualized I/O

Creating an agile infrastructure with Virtualized I/O etrading & Market Data Agile infrastructure Telecoms Data Center Grid Creating an agile infrastructure with Virtualized I/O Richard Croucher May 2009 Smart Infrastructure Solutions London New York Singapore

More information

Networking at the Speed of Light

Networking at the Speed of Light Networking at the Speed of Light Dror Goldenberg VP Software Architecture MaRS Workshop April 2017 Cloud The Software Defined Data Center Resource virtualization Efficient services VM, Containers uservices

More information

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate NIC-PCIE-1SFP+-PLU PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate Flexibility and Scalability in Virtual

More information

SUSE Linux Enterprise Server (SLES) 15 Inbox Driver Release Notes SLES 15

SUSE Linux Enterprise Server (SLES) 15 Inbox Driver Release Notes SLES 15 SUSE Linux Enterprise Server (SLES) 15 Inbox Driver Release Notes SLES 15 www.mellanox.com Mellanox Technologies NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT PRODUCT(S) ᶰ AND ITS RELATED DOCUMENTATION

More information

Welcome to the IBTA Fall Webinar Series

Welcome to the IBTA Fall Webinar Series Welcome to the IBTA Fall Webinar Series A four-part webinar series devoted to making I/O work for you Presented by the InfiniBand Trade Association The webinar will begin shortly. 1 September 23 October

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ

Networking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ Networking for Data Acquisition Systems Fabrice Le Goff - 14/02/2018 - ISOTDAQ Outline Generalities The OSI Model Ethernet and Local Area Networks IP and Routing TCP, UDP and Transport Efficiency Networking

More information

The Exascale Architecture

The Exascale Architecture The Exascale Architecture Richard Graham HPC Advisory Council China 2013 Overview Programming-model challenges for Exascale Challenges for scaling MPI to Exascale InfiniBand enhancements Dynamically Connected

More information

Infiniband Fast Interconnect

Infiniband Fast Interconnect Infiniband Fast Interconnect Yuan Liu Institute of Information and Mathematical Sciences Massey University May 2009 Abstract Infiniband is the new generation fast interconnect provides bandwidths both

More information

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Evaluating the Impact of RDMA on Storage I/O over InfiniBand Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline

More information

RDMA programming concepts

RDMA programming concepts RDMA programming concepts Robert D. Russell InterOperability Laboratory & Computer Science Department University of New Hampshire Durham, New Hampshire 03824, USA 2013 Open Fabrics Alliance,

More information

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09 RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement

More information

PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based

PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based PE310G4TSF4I71 Quad Port SFP+ 10 Gigabit Ethernet PCI Express Time Stamp Server Adapter Intel Based Product Description Silicom s 40 Gigabit Ethernet PCI Express Time Stamping server adapter is designed

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme NET1343BU NSX Performance Samuel Kommu #VMworld #NET1343BU Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no

More information

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson

Industry Standards for the Exponential Growth of Data Center Bandwidth and Management. Craig W. Carlson Industry Standards for the Exponential Growth of Data Center Bandwidth and Management Craig W. Carlson 2 Or Finding the Fat Pipe through standards Creative Commons, Flikr User davepaker Overview Part of

More information

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097 Chair for Network Architectures and Services Prof. Carle Department for Computer Science TU München Master Course Computer Networks IN2097 Chapter 7 - Network Measurements Introduction Architecture & Mechanisms

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

by Brian Hausauer, Chief Architect, NetEffect, Inc

by Brian Hausauer, Chief Architect, NetEffect, Inc iwarp Ethernet: Eliminating Overhead In Data Center Designs Latest extensions to Ethernet virtually eliminate the overhead associated with transport processing, intermediate buffer copies, and application

More information

Data Sheet FUJITSU PLAN EP Intel X710-DA2 2x10GbE SFP+

Data Sheet FUJITSU PLAN EP Intel X710-DA2 2x10GbE SFP+ Data Sheet FUJITSU PLAN EP Intel X710-DA2 2x10GbE SFP+ Data Sheet FUJITSU PLAN EP Intel X710-DA2 2x10GbE SFP+ Dual-port 10 Gbit PCIe 3.0 Ethernet cards enable data exchange between all the devices connected

More information

FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE

FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE 1 FPGA Implementation of RDMA-Based Data Acquisition System Over 100 GbE Wassim Mansour, Member, IEEE, Nicolas Janvier, Member, IEEE, and Pablo Fajardo Abstract This paper presents an RDMA over Ethernet

More information

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097 Chair for Network Architectures and Services Prof. Carle Department for Computer Science TU München Master Course Computer Networks IN2097 Prof. Dr.-Ing. Georg Carle Christian Grothoff, Ph.D. Dr. Nils

More information

Fermi Cluster for Real-Time Hyperspectral Scene Generation

Fermi Cluster for Real-Time Hyperspectral Scene Generation Fermi Cluster for Real-Time Hyperspectral Scene Generation Gary McMillian, Ph.D. Crossfield Technology LLC 9390 Research Blvd, Suite I200 Austin, TX 78759-7366 (512)795-0220 x151 gary.mcmillian@crossfieldtech.com

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

Storage Protocol Offload for Virtualized Environments Session 301-F

Storage Protocol Offload for Virtualized Environments Session 301-F Storage Protocol Offload for Virtualized Environments Session 301-F Dennis Martin, President August 2016 1 Agenda About Demartek Offloads I/O Virtualization Concepts RDMA Concepts Overlay Networks and

More information

Agilio CX 2x40GbE with OVS-TC

Agilio CX 2x40GbE with OVS-TC PERFORMANCE REPORT Agilio CX 2x4GbE with OVS-TC OVS-TC WITH AN AGILIO CX SMARTNIC CAN IMPROVE A SIMPLE L2 FORWARDING USE CASE AT LEAST 2X. WHEN SCALED TO REAL LIFE USE CASES WITH COMPLEX RULES TUNNELING

More information

MoonGen: A Fast and Flexible Packet Generator

MoonGen: A Fast and Flexible Packet Generator MoonGen: A Fast and Flexible Packet Generator Paul Emmerich emmericp@net.in.tum.de Technical University of Munich Chair of Network Architectures and Services IETF-100, 16.11.2017 Research at net.in.tum

More information

Demystifying Network Cards

Demystifying Network Cards Demystifying Network Cards Paul Emmerich December 27, 2017 Chair of Network Architectures and Services About me PhD student at Researching performance of software packet processing systems Mostly working

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+ Product Specification NIC-10G-2BF A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+ Apply Dual-port 10 Gigabit Fiber SFP+ server connections, These Server Adapters Provide Ultimate Flexibility and Scalability

More information

Rapid prototyping of DPDK applications with libmoon

Rapid prototyping of DPDK applications with libmoon Rapid prototyping of DPDK applications with libmoon Paul Emmerich emmericp@net.in.tum.de Technical University of Munich Chair of Network Architectures and Services DPDK Summit, 27.9.2017 About me PhD student

More information

InfiniBand and Mellanox UFM Fundamentals

InfiniBand and Mellanox UFM Fundamentals InfiniBand and Mellanox UFM Fundamentals Part Number: MTR-IB-UFM-OST-A Duration: 3 Days What's in it for me? Where do I start learning about InfiniBand? How can I gain the tools to manage this fabric?

More information

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency

More information

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev

TLDK Overview. Transport Layer Development Kit Keith Wiles April Contributions from Ray Kinsella & Konstantin Ananyev TLDK Overview Transport Layer Development Kit Keith Wiles April 2017 Contributions from Ray Kinsella & Konstantin Ananyev Notices and Disclaimers Intel technologies features and benefits depend on system

More information

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by

More information

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU

PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU PLUSOPTIC NIC-PCIE-2SFP+-V2-PLU PCI Express v3.0 x8 Dual Port SFP+ 10 Gigabit Server Adapter (Intel X710- BM2 Based) Overview: NIC-PCIE-2SFP+-V2-PLU is PLUSOPTIC a new generation of high-performance server

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

Measuring MPLS overhead

Measuring MPLS overhead Measuring MPLS overhead A. Pescapè +*, S. P. Romano +, M. Esposito +*, S. Avallone +, G. Ventre +* * ITEM - Laboratorio Nazionale CINI per l Informatica e la Telematica Multimediali Via Diocleziano, 328

More information

Best Practices for Deployments using DCB and RoCE

Best Practices for Deployments using DCB and RoCE Best Practices for Deployments using DCB and RoCE Contents Introduction... Converged Networks... RoCE... RoCE and iwarp Comparison... RoCE Benefits for the Data Center... RoCE Evaluation Design... RoCE

More information

NVMe Direct. Next-Generation Offload Technology. White Paper

NVMe Direct. Next-Generation Offload Technology. White Paper NVMe Direct Next-Generation Offload Technology The market introduction of high-speed NVMe SSDs and 25/40/50/100Gb Ethernet creates exciting new opportunities for external storage NVMe Direct enables high-performance

More information

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011 DB2 purescale: High Performance with High-Speed Fabrics Author: Steve Rees Date: April 5, 2011 www.openfabrics.org IBM 2011 Copyright 1 Agenda Quick DB2 purescale recap DB2 purescale comes to Linux DB2

More information

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710

Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL710 COMPETITIVE BRIEF April 5 Choosing the Best Network Interface Card for Cloud Mellanox ConnectX -3 Pro EN vs. Intel XL7 Introduction: How to Choose a Network Interface Card... Comparison: Mellanox ConnectX

More information

Module 2 Storage Network Architecture

Module 2 Storage Network Architecture Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,

More information

ALLNET ALL0141-4SFP+-10G / PCIe 10GB Quad SFP+ Fiber Card Server

ALLNET ALL0141-4SFP+-10G / PCIe 10GB Quad SFP+ Fiber Card Server ALLNET ALL0141-4SFP+-10G / PCIe 10GB Quad SFP+ Fiber Card Server EAN CODE 4 0 3 8 8 1 6 0 6 9 1 2 2 Highlights: Quad-port 10GbE SFP+ server adapters PCI Express (PCIe) v3.0, 8.0 GT/s, x8 lanes SFP+ Connectivity

More information

HP Cluster Interconnects: The Next 5 Years

HP Cluster Interconnects: The Next 5 Years HP Cluster Interconnects: The Next 5 Years Michael Krause mkrause@hp.com September 8, 2003 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

More information

14th ANNUAL WORKSHOP 2018 NVMF TARGET OFFLOAD. Liran Liss. Mellanox Technologies. April 2018

14th ANNUAL WORKSHOP 2018 NVMF TARGET OFFLOAD. Liran Liss. Mellanox Technologies. April 2018 14th ANNUAL WORKSHOP 2018 NVMF TARGET OFFLOAD Liran Liss Mellanox Technologies April 2018 AGENDA Introduction NVMe NVMf NVMf target driver Offload model Verbs interface Status 2 OpenFabrics Alliance Workshop

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about

More information