Санкт-Петербургский Государственный Политехнический Университет Факультет Технической Кибернетики Кафедра Информационных Измерительных Технологий

Size: px
Start display at page:

Download "Санкт-Петербургский Государственный Политехнический Университет Факультет Технической Кибернетики Кафедра Информационных Измерительных Технологий"

Transcription

1 Санкт-Петербургский Государственный Политехнический Университет Факультет Технической Кибернетики Кафедра Информационных Измерительных Технологий Доклад по HPC Тема: Interconnection technologies of Cluster Networks Выполнил: Ван Юйцзюань Гр. 6085/1 Проверил: Солнушкин К. С. Санкт-Петербург

2 Chapter 1. Cluster Networks Concept Commodity cluster solutions are viable today due to a number of factors such as the high performance commodity servers and the availability of high speed, low-latency network switch technologies that provide the inter-nodal communications. Commodity clusters typically incorporate one or more dedicated switches to support communication between the cluster nodes. The speed and type of node interconnects vary based on the requirements of the application and organization. With today s low costs per-port for Gigabit Ethernet switches, adoption of 10-Gigabit Ethernet and the standardization of 10/100/1000 network interfaces on the node hardware, Ethernet continues to be a leading interconnect technology for many clusters. In addition to Ethernet, alternative network or interconnect technologies include Myrinet, Quadrics, and Infiniband that support bandwidths above 1Gbps and end-to-end message latencies below 10 microseconds (usec). 1.1 Network Characterization There are two primary characteristics establishing the operational properties of a network: Bandwidth -- is measured in millions of bits per second (Mbps) and/or billions of bits per-second (Gbps). Peak bandwidth is the maximum amount of data that can be transferred in a single unit of time through a single connection. Bi-section bandwidth is the total peak bandwidth that can be passed across a single switch. Latency -- is measured in microseconds (usec) or milliseconds (msec) and is the time it takes to move a single packet of information in one port and out of another. For parallel clusters, latency is measured as the time it takes for a message to be passed from one processor to another that includes the latency of the interconnecting switch or switches. The actual latencies observed will vary widely even on a single switch depending on characteristics such as packet size, switch architecture (centralized versus distributed), queuing, buffer depths and allocations, and protocol processing at the nodes. Tab.1 Examples of some interconnection technologies 2

3 Table.1. gives a summary of some interconnection technologies whick are then compared as shown in table.2. The comparisons examine factors that include: bandwidth, latency, hardware availability, suppot for Linux, muximum number of custer nodes supported, how the protocol is implemented, support for Virtual Interface Architecture(VIA), and support for Message Passing Interface(MPI). Comparison Criteria Gigabit Ethernet 10G Ethernet Infiniban d 4X Interface PCI PCI PCIe PCIe Myrinet QsNet-II SCI PCI-X 1.0/PCIe 1.0a Peak Bandwidth (MB/s) < <320 MPI Latency (us) < <7 10 <3 1-2 Hardware Availability Now Now Now Now Now Now Linux Support Now Now Now Now Now Now Max. No. of Nodes 1000 s 1000 s >1000 s 1000 s s Protocol Firmware on Firmware on Firmware Hardware Hardware Hardware Implementation adaptor adaptor on adaptor VIA Support NT/Linux Software Linux None Software MVICH over MPI Support MPI/Pro 3 rd Party Quadrics 3 rd Party M-VIA, TCP Tab.2. Comparison of some interconnection technologies ( Peak bandwidth of 10G Ethernet got from MPI latency of 10G Ethernet got from PCI Chapter 2. Classify of Interconnection of Cluster Networks 2.1 Gigabit Ethernet and 10 Gigabit Ethernet Gigabit Ethernet (GbE or 1 GigE) Gigabit Ethernet is a term describing various technologies for transmitting Ethernet frames at a rate of a gigabit per second, as defined by the IEEE standard. Half duplex gigabit links connected through hubs are allowed by the specification but in the marketplace full duplex with switches is the norm. Network topolopy: Star. Speed: 1000 Mbit/s 10 Gigabit Ethernet Here, we will discuss the performance of ther first Quadrics 10 Gb/s Ethernet (10 GbE) switch with Myricom Myri-10G adapters, also using socket and MPI microbenchmarks. In order to evaluate the 3

4 performance of MPI, we used one of the most well-known sets of micro-benchmarks: the Ohio State University (OSU) MPI Benchmarks. During all testing, we can get the conclusions: the network while using sockets is able to deliver a UDP bandwidth up to 9.92 Mb/s and TCP bandwidth of 9.89 Gb/s, that is full available bandwidth utilization. In terms of latency, TCP latency goes as low as 9.89 us on top of which a switching latency as low as 470 ns on the shortest switch path and up to 1.6 us on the longest has to be added. Using MPI only slightly decreases bandwidth down to 9.61 Gb/s and 8.93 Gb/s for 128 KB messages, whilst latency is degraded to 12 us. The switch adds the same extra low latency, between 470 us and 1.6 us. The data can be got from Here, I want to introduce an Ethernet switch QS-TG108. It is an 8U chassis providing up to Gb/s Ethernet ports, and utilises the new low cost 10 Gb/s Ethernet specification 10Gbase-CX4. This manufacture is qualified for use with 10 Gb/s Ethernet adapters from Intel, Chelsio, Myricom and Neteffect. 10 Gigabit Ethernet can support many standards, which are used to transmit information to different distance. See their features in table. ( Nomenclaturee Distance Media Note 10GBASE-CX4 15 m Copper Used for InfiniBand 4x adapter and CX4 cable 10GBASE-SR 26 82m Multimode Fiber 300m Multimode Fiber (2GHz) GBASE-LX m Multimode Fiber 10km Singlemode Fiber GBASE-LR/ER 10km/40km Singlemode Fiber GBASE-SW/LW/EW Same as 10GBASE- Fiber SR/ LR/ER GBASE-T 55 m Category 6 cabling 100 m Category 6 or 6a Tab.3. 10G Ethernet suit standards 2.2 Scalable Coherent Interface(SCI) ( Scalable Coherent Interface, is the oldest of the networks discussed here. SCI was largely designed to avoid the usual bus limitations which resulted in a ring structure to connect the hosts. This is because of one of the design principles of SCI: signals flow continuously in one direction enabling a very high signal rate without noise interference. SCI nodes are interconnected through unidirectional point-to-point links in a ring/ringlet topology. Switches are used to connect multiple independent SCI ringlets. A special feature of SCI is its ability keep the caches of the processors it connects to coherent. In clusters SCI is always used as just an internode network. Because of the ring structure of SCI this means that the network is arranged as a 1-D, 2-D, or 3-D torus as shown in Figure 4

5 Figure.1. Topology of SCI ( The torus network has some risks in comparison to other networks like the fat tree or the Clos network: when a node adaptor fails it incapacitates all the nodes that share the ring on which it lies. So, for instance, Dolphin networks, one of the SCI vendors provides software to reconfigure the torus in such a way that the minimal number of nodes become unavailable. Furthermore, it is not possible to add or remove an arbitrary number of nodes in the cluster because of the torus topology. Performances of the SCI-based clusters: Network topolopy: 1-D, 2-D, or 3-D torus Bandwidth: up to about 320 MB/s for a Ping-Pong experiment MPI latency: 1 2 µs for small messages. 2.3 Myrinet Nowadays, the interconnect technology that applies most extensive is -- Myrinet. At present, Myricom supplies Myrinet components and software in two series: Myrinet-2000 and Myri-10G. Myrinet-2000 is a superior alternative to Gigabit Ethernet for clusters, whereas Myri-10G offers performance and cost advantages over 10-Gigabit Ethernet. Myri-10G uses the same physical layers (PHYs: cables, connectors, signaling) as 10- Gigabit Ethernet, and is highly interoperable with 10-Gigabit Ethernet. In fact, Myri-10G NICs are both 10- Gigabit Myrinet NICs and 10-Gigabit Ethernet NICs. Performances of Myrinet: can suit any topology. compare the bandwidth, latency and other features of the two series: 5

6 Tab.4. Performance Comparing of the series Myrinet-2000 and Myri-10G ( Prices of Myrinet NICs (data from For Myrinet-2000, the prices of NICs can be found out in figure. Tab.5. Prices of NICs for Myrinet-2000 And the prices of NICs for Myri-10G can be found out in next tables: 6

7 1. For 10-Gigabit Ethernet solutions. Tab.6. Prices of NICs for Myri-10G by 10-Gigabit Ethernet solutions 2. For 10-Gigabit Myrinet and HPC solutions. Tab.7. Prices of NICs for Myri-10G by 10-Gigabit Myrinet and HPC solutions Prices of Myrinet Switches (data from 1. For Myrinet-2000 products Small switches Tab.8. Small Myrinet-2000 Switches 7

8 Intermediate-size switch networks Tab.9. Intermediate-size switch networks Switch networks for large clusters Tab.10. Switch networks for large clusters Clos256: 256 host ports, no inter-switch ports. Clos : 256 host ports, 256 interswitch ports on 64 quad-fiber ports. Spine1280: 10 quad (40) 32-port switches presented on 320 quad-fiber ports (1280 ports total). This configuration may be used as the spine for 5 Clos units. 2. For Myri-10G products 8

9 Tab.11. Myri-10G Switches Myrinet-2000 Hardware Latency ( 2000/) The latency measurements presented on these pages were, unless otherwise noted, between NICs whose ports are connected through short cables through a single switch. The measured latency includes ~0.5µs total hardware latency in the circuitry between the ports of the Lanai-X chips in the two NICs. This hardware latency is introduced by circuitry that converts to and from the Fiber Physical layer, and by the switch itself. Networks for more than 512 and up to 8192 hosts with XBar32- based switches have hardware latencies up to ~1.1µs, but the hardware latency is still small compared with the software latency. The "time of flight" latency of a fiber cable is ~0.0065µs per meter, i.e., ~1.3µs for a maximal-length, 200m fiber cable, but only 0.03µs for a 5m cable. 2.4 Quadrics Quadrics is a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produce hardware and software for clustering commodity computer systems into massively parallel systems. The products of Quadrics Hardware: (from QsNet I - Quadrics interconnect based around the elan3/elite3 ASICs 5us MPI latency) QsNet II - Quadrics interconnect based around the elan4/elite4 ASICs (912MB/s on SR1400 EM64T and 1.26us MPI latency on HP DL145G2) QsNet II E-Series- a range of small-medium configurations (8-128-way) at less than USD 1,800 price/per port QsTenG - 10 Gigabit Ethernet switch for up to 96 ports. The newest performance data that the company supply is shown in table. ( 9

10 MPI Latency Bandwidth Peak Bandwidth Interface QsNet-I 5us 350MB/s 528MB/s PCI 2.1 QsNet-II 1.22 us 912MB/s 1064MB/s PCI-X 1.0 Tab.12. Features comparison of QsNet Nowadays, the most popular product is QsNet-II, so we will describe its performance in details. QsNet- II uses the fat-tree topology. It is possible that connecting most 4096 nodes using QsNet-II (connecting most 1024 nodes with QsNet-I). The components of QsNet-II network are Elan4 network card and Elite4 exchange machine. Basic performance ( The bandwidth of QsNet-II was measured in two ways: a simple ping test where two nodes communicate between each other in turn; a batched test where a number of MPI sends and receives are queued up together. The results are shown in following picture. Figure.2. QsNet-II MPI Bandwidth Figure.3. QsNet-II MPI Latency 10

11 QsNet-II could make two systems: ( 7) small system(up to 8, 32 or 128 nodes) using a single switch. larger system( nodes) using a federated network consisting of multiple switch modules. QsNet-II uses a fat tree topology, this permits scaling up to 4096 nodes. QsNet-II hardware is just one part of a complete family of products for building high performance clusters. The basic building block of QsNet-II switch networks is the QS5A switch chassis. A single chassis can be configured to provide up to 64 prots of switching implemented as a 3 stage fat tree networks. For switches of greater than 64 ports, multiple switch chassis are sed in a federated network. Federated switching is a packaging solution, which enables very large networks to be imlemented with two stages of switch chassis. For instance, configuration of a 256-way system with 4 node levels switches each porting 64 nodes and 2 top level switches (full bandwidth) is shown at figure.4.( 0053D9DA) Figure.4. Configuration of 256-way system In this figure, number 32 means a 32 way standalone switch, which can provide double the bandwidth for the high CPU count SMP nodes. And other configurations are listed in tab. Tab.13. Function to configurate different system In following figures showed the adapters and switches used for QsNet-II technology: 11

12 Fig.5. QsNet-II adapters Fig.6. QsNet-II swiches 2.5 Infiniband InfiniBand is a technology standard which components include IBM, Intel, Sun, Voltaire, Mellanox, Topspin, etc. It is a switched fabric communications link primarily used in high-performance computing. Infiniband uses a switched fabric topology, as opposed to a hierarchical switched network like Ethernet. 12

13 The unidirectional MPI bandwidth may approach to 860Mb/s, and the lowest MPI latency is 4.5us (evaluated by Mellanox company, using network card with PCI-X interface. The latency will increase to 5.9us if it is applied on MPI application layer.) Like Fibre Channel, PCI Express, Serial ATA, and many other modern interconnects, InfiniBand is a point-to-point bidirectional serial link intended for the connection of processors with high speed peripherals such as disks. It supports several signalling rates and, as with PCI Express, links can be bonded together for additional bandwidth. Signalling rate ( The serial connection's signalling rate is 2.5 gigabit per second (Gbit/s) in each direction per connection. InfiniBand supports double (DDR) and quad data (QDR) speeds, for 5 Gbit/s or 10 Gbit/s respectively, at the same data-clock rate. Links use 8B/10B encoding every 10 bits sent carry 8bits of data so that the useful data transmission rate is four-fifths the raw rate. Thus single, double, and quad data rates carry 2, 4, or 8 Gbit/s respectively. Links can be aggregated in units of 4 or 12, called 4X or 12X. A quad-rate 12X link therefore carries 120 Gbit/s raw, or 96 Gbit/s of useful data. Most systems today use either a 4X 2.5Gb/s (SDR) or 5Gb/s (DDR) connection. InfiniBand QDR was already demonstrated during 2007, with expectations of productions systems during Larger systems with 12x links are typically used for cluster and supercomputer interconnects and for interswitch connections. Effective theoretical throughput in different configurations ( Single Double Quad 1X 2 Gbit/s 4 Gbit/s 8 Gbit/s 4X 8 Gbit/s 16 Gbit/s 32 Gbit/s 12X 24 Gbit/s 48 Gbit/s 96 Gbit/s Tab.14. Effective theoretical throughput in different configurations Latency ( The single data rate switch chips have a latency of 200 nanoseconds, and DDR switch chips have a latency of 140 nanoseconds. However, due to the larger effect of the end-points, the total message delivery latency is much larger, from 1.26 microseconds MPI latency (Mellanox ConnectX HCAs) to 1.29 microseconds MPI latency (Qlogic InfiniPath HTX HCAs) to 2.6 microseconds (Mellanox InfiniHost III HCAs). Nowadays, the products of Mellanox company are usually used. Mellanox offers a dual-port 10Gb/s PCI-X card, and dual-port and single-port 20Gb/s PCI Express cards designed to drive the full performance of high-speed InfiniBand fabrics: ConnectX IB Adapter Cards ConnectX EN Adapter Cards InfiniHost Adapter Cards InfiniHost III Ex Adapter Cards InfiniHost III Lx Adapter Cards Their detailed data could be found at website: 13

14 ( Infiniband switches for HPC are normally offered with ports and presently mostly at a speed of 1.25 GB/s. The switches can be configured in any desired topology but in practice a fat tree topology is almost always preferred. Now we introduce some switches in detail: 1. Flextronics 288-port 4X InfiniBand DDR Switch ( Fig.7. Features of Flextronics 288-port 4X InfiniBand DDR Switch 2. Switches Mellanox InfiniScale - Eight- Port 10 Gb/s 4X InfiniBand Switch Mellanox InfiniScale III 24- port 4X (or 8-port 12X) InfiniBand Switch supporting up to 60Gb/s per port KEY FEATURES Eight 10Gb/s InfiniBand 4X Ports 160Gb/s of Available Bandwidth Scalable to 32 Non-Blocking Full Wire Speed 1X Ports Scalable to 192 Ports 1X (2.5Gb/s) Supports Multi-Protocol Applications for Clustering, Communication, and Storage Integrated SMA and GSA Inbound and Outbound Partition (P_KEY) Checking 32 Integrated 2.5Gb/s SerDes Cut-Through Switching InfiniPCI Technology Support Programmable Port Mirroring Large On Chip Port Buffering Twenty-four 10 or 20Gb/s InfiniBand 4X ports or eight 30 or 60Gb/s InfiniBand 12X ports (or any combination) 480Gb/s (SDR version) or 960Gb/s (DDR version) of total switching bandwidth Scalable to thousands of Ports 96 integrated 2.5Gb/s (SDR version) or 5Gb/s (DDR version) SerDes interfaces (physical layer) Auto-negotiation of port link speed Ultra low latency cut-through switching (less than 200 nanoseconds) MTU Size from 256 to 2K bytes 14

15 Mellanox InfiniScale IV - Fourth Generation InfiniBand Switch Supports Multi-Protocol Applications for Clustering, Communication, and Storage Integrated Subnet Management Agent (SMA) 36 40Gb/s InfiniBand 4X ports 12 80Gb/s InfiniBand 8X ports Gb/s InfiniBand 12X ports Port speed auto-negotiation Line rate switching 2.88 Tb/s total switching capacity 144 integrated SerDes interfaces, each operate up to 10+10Gb/s Tab.15. Features of switches made with Mellanox company 2.6 Marketable Distribution of several cluster networks According the report of Top500 in 2007, the development of infiniband and gigabit ethernet are very fast, and there is a downtrend of myrinet. We also can see the distributed instance: Figure.8. Interconnect Family share for 11/2007 Conclusions How can we evaluate if the interconnection of cluster networks is good or not? There are some primary factors need to be considered: Latency, Bandwidth, Price, and Supports. Generally speaking, price is direct ratio with bandwidth, and inverse ratio with latency. For saving money, we need to configurate interconnection according different application. In a word, the highest bandwidth is Infiniband, the lowest latency is SCI, but SCI and QsNet have the expensive prices, followed by Infiniband, Myrinet, Gigabit Ethernet. The all interconnection support MPI in function, higher efficienct communication protocols had been realized except for Gigabit Ethernet. SCI and QsNet can support sharing memory, but the latency of accessing remote nodes is still on level millisecond. Seeing the developing trend, Infiniband will be the primary interconnection equipment because of its scaling benefits. 15

16 Список литературы [1] Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of computers [2] QsNet-II Performance results [3] Quadrics QsTenG Ethernet Switch Performance Report [4] [5] [6] [7] /3A912204F DD C7 [8] /3A912204F DD C7 [9] /2B3D9C399FA9BED D9DA [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] 16

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

ARISTA: Improving Application Performance While Reducing Complexity

ARISTA: Improving Application Performance While Reducing Complexity ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G 10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

The NE010 iwarp Adapter

The NE010 iwarp Adapter The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD. OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Experiences with HP SFS / Lustre in HPC Production

Experiences with HP SFS / Lustre in HPC Production Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX -2 EN with RoCE Adapter Delivers Reliable Multicast Messaging With Ultra Low Latency

More information

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview Overview HP supports 40Gbps 4x Quad Data Rate (QDR) and 20Gbps 4X Double Data Rate (DDR) InfiniBand products that include Host Channel Adapters (HCA), switches, and cables for HP ProLiant servers, and

More information

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance

More information

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency Mellanox continues its leadership providing InfiniBand Host Channel

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Myri-10G Myrinet Converges with Ethernet

Myri-10G Myrinet Converges with Ethernet Myri-10G Myrinet Converges with Ethernet David PeGan VP, Sales dave@myri.com (Substituting for Tom Leinberger) 4 October 2006 Oklahoma Supercomputing Symposium 1 New Directions for Myricom Although Myricom

More information

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0 INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,

More information

High Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks

High Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks High Performance Computing: Concepts, Methods & Means Enabling Technologies 2 : Cluster Networks Prof. Amy Apon Department of Computer Science and Computer Engineering University of Arkansas March 15 th,

More information

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu

More information

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview Overview HP supports 40Gbps (QDR) and 20Gbps (DDR) InfiniBand products that include mezzanine Host Channel Adapters (HCA) for server blades, switch blades for c-class enclosures, and rack switches and

More information

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture

SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging

TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging Executive Summary: With the recent release of TIBCO FTL TM, TIBCO is once again changing the game when it comes to providing high performance

More information

Myricom s Myri-10G Software Support for Network Direct (MS MPI) on either 10-Gigabit Ethernet or 10-Gigabit Myrinet

Myricom s Myri-10G Software Support for Network Direct (MS MPI) on either 10-Gigabit Ethernet or 10-Gigabit Myrinet Myricom s Myri-10G Software Support for Network Direct (MS MPI) on either 10-Gigabit Ethernet or 10-Gigabit Myrinet Dr. Markus Fischer Senior Software Architect fischer@myri.com 30. März 2009 2. Treffen

More information

The Mellanox ConnectX-2 Dual Port QSFP QDR IB network adapter for IBM System x delivers industryleading performance and low-latency data transfer

The Mellanox ConnectX-2 Dual Port QSFP QDR IB network adapter for IBM System x delivers industryleading performance and low-latency data transfer IBM United States Hardware Announcement 111-209, dated September 27, 2011 The Mellanox ConnectX-2 Dual Port QSFP QDR IB network adapter for IBM System x delivers industryleading performance and low-latency

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,

More information

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics

More information

<Insert Picture Here> Exadata Hardware Configurations and Environmental Information

<Insert Picture Here> Exadata Hardware Configurations and Environmental Information Exadata Hardware Configurations and Environmental Information Revised July 1, 2011 Agenda Exadata Hardware Overview Environmental Information Power InfiniBand Network Ethernet Network

More information

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The

More information

High Performance Computing

High Performance Computing High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

High Performance MPI on IBM 12x InfiniBand Architecture

High Performance MPI on IBM 12x InfiniBand Architecture High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction

More information

Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access

Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access Adaptive MPI Multirail Tuning for Non-Uniform Input/Output Access S. Moreaud, B. Goglin and R. Namyst INRIA Runtime team-project University of Bordeaux, France Context Multicore architectures everywhere

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

Cluster Computing. Interconnect Technologies for Clusters

Cluster Computing. Interconnect Technologies for Clusters Interconnect Technologies for Clusters Interconnect approaches WAN infinite distance LAN Few kilometers SAN Few meters Backplane Not scalable Physical Cluster Interconnects FastEther Gigabit EtherNet 10

More information

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09 RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC 2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste

More information

JMR ELECTRONICS INC. WHITE PAPER

JMR ELECTRONICS INC. WHITE PAPER THE NEED FOR SPEED: USING PCI EXPRESS ATTACHED STORAGE FOREWORD The highest performance, expandable, directly attached storage can be achieved at low cost by moving the server or work station s PCI bus

More information

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP: Overview HP supports 56 Gbps Fourteen Data Rate (FDR) and 40Gbps 4X Quad Data Rate (QDR) InfiniBand (IB) products that include mezzanine Host Channel Adapters (HCA) for server blades, dual mode InfiniBand

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

QuickSpecs. HP Z 10GbE Dual Port Module. Models

QuickSpecs. HP Z 10GbE Dual Port Module. Models Overview Models Part Number: 1Ql49AA Introduction The is a 10GBASE-T adapter utilizing the Intel X722 MAC and X557-AT2 PHY pairing to deliver full line-rate performance, utilizing CAT 6A UTP cabling (or

More information

The Future of High Performance Interconnects

The Future of High Performance Interconnects The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox

More information

MM5 Modeling System Performance Research and Profiling. March 2009

MM5 Modeling System Performance Research and Profiling. March 2009 MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

Cisco SFS 7000D InfiniBand Server Switch

Cisco SFS 7000D InfiniBand Server Switch Data Sheet The Cisco SFS 7000D InfiniBand Server Switch sets the standard for cost-effective, low-latency, 4X DDR and SDR InfiniBand switching for building high-performance clusters. High-performance computing

More information

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory

More information

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract

Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand. Abstract Performance of Mellanox ConnectX Adapter on Multi-core Architectures Using InfiniBand Abstract...1 Introduction...2 Overview of ConnectX Architecture...2 Performance Results...3 Acknowledgments...7 For

More information

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP: Overview HP supports 56 Gbps Fourteen Data Rate (FDR) and 40Gbps 4X Quad Data Rate (QDR) InfiniBand (IB) products that include mezzanine Host Channel Adapters (HCA) for server blades, dual mode InfiniBand

More information

InfiniBand Experiences of PC²

InfiniBand Experiences of PC² InfiniBand Experiences of PC² Dr. Jens Simon simon@upb.de Paderborn Center for Parallel Computing (PC²) Universität Paderborn hpcline-infotag, 18. Mai 2004 PC² - Paderborn Center for Parallel Computing

More information

Product Specification NIC-1G-2PF. A-GEAR PRO Gigabit PF Dual Post Server Adapter NIC-1G-2PF. datasheet A-GEAR World Wide Manufacturing

Product Specification NIC-1G-2PF. A-GEAR PRO Gigabit PF Dual Post Server Adapter NIC-1G-2PF. datasheet A-GEAR World Wide Manufacturing Product Specification NIC-1G-2PF A-GEAR PRO Gigabit PF Dual Post Server Adapter Apply two Gigabit Fiber SFP ports server connections in a single PCI Express * slot. 1. Features Build on PCI Express Technology

More information

Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand

Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand th IEEE Symposium on High Performance Interconnects Performance Analysis and Evaluation of PCIe. and Quad-Data Rate InfiniBand Matthew J. Koop Wei Huang Karthik Gopalakrishnan Dhabaleswar K. Panda Network-Based

More information

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide 2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide The 2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter is a dual port InfiniBand Host

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

An Analysis of the Pathscale Inc. InfiniBand Host Channel Adapter, InfiniPath

An Analysis of the Pathscale Inc. InfiniBand Host Channel Adapter, InfiniPath SANDIA REPORT SAND2005-5199 Unlimited Release Printed August 2005 An Analysis of the Pathscale Inc. InfiniBand Host Channel Adapter, InfiniPath Douglas W. Doerfler Prepared by Sandia National Laboratories

More information

Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics

Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics Lloyd Dickman, CTO InfiniBand Products Host Solutions Group QLogic Corporation November 13, 2007 @ SC07, Exhibitor Forum

More information

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center

More information

Voltaire Making Applications Run Faster

Voltaire Making Applications Run Faster Voltaire Making Applications Run Faster Asaf Somekh Director, Marketing Voltaire, Inc. Agenda HPC Trends InfiniBand Voltaire Grid Backbone Deployment examples About Voltaire HPC Trends Clusters are the

More information

Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen

Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen Fast-communication PC Clusters at DESY Peter Wegner DV Zeuthen 1. Motivation, History, Cluster Schema 2. PC cluster fast interconnect 3. PC Cluster Hardware 4. PC Cluster Software 5. Operating 6. Future

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017 In-Network Computing Sebastian Kalcher, Senior System Engineer HPC May 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric (Offload) Must Wait

More information

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses 1 Most of the integrated I/O subsystems are connected to the

More information

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,

More information

Single-Points of Performance

Single-Points of Performance Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Designing High Performance DSM Systems using InfiniBand Features

Designing High Performance DSM Systems using InfiniBand Features Designing High Performance DSM Systems using InfiniBand Features Ranjit Noronha and Dhabaleswar K. Panda The Ohio State University NBC Outline Introduction Motivation Design and Implementation Results

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Infiniband and RDMA Technology. Doug Ledford

Infiniband and RDMA Technology. Doug Ledford Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic

More information

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,

More information

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

reinventing data center switching

reinventing data center switching reinventing data center switching Arista Data Center Portfolio veos 7048 7100 S 7100 T Manages VMware VSwitches 48-port GigE Data Center Switch 24/48 port 1/10Gb SFP+ Low Latency Data Center Switches 24/48-port

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

Ethernet Technologies

Ethernet Technologies Ethernet Technologies CCNA 1 v3 Module 7 NESCOT CATC 1 10 Mbps Ethernet Legacy Ethernet means: 10BASE5 10BASE2 10BASE-T Common features are: frame format timing parameters transmission process basic design

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide

InfiniBand OFED Driver for. VMware Virtual Infrastructure (VI) 3.5. Installation Guide Mellanox Technologies InfiniBand OFED Driver for VMware Virtual Infrastructure (VI) 3.5 Installation Guide Document no. 2820 Mellanox Technologies http://www.mellanox.com InfiniBand OFED Driver for VMware

More information

PC DESY Peter Wegner. PC Cluster Definition 1

PC DESY Peter Wegner. PC Cluster Definition 1 PC Cluster @ DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future PC Cluster Definition 1 Idee: Herbert Cornelius (Intel München) 1 PC Cluster

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute

More information

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Evaluating the Impact of RDMA on Storage I/O over InfiniBand Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline

More information

NVM PCIe Networked Flash Storage

NVM PCIe Networked Flash Storage NVM PCIe Networked Flash Storage Peter Onufryk Microsemi Corporation Santa Clara, CA 1 PCI Express (PCIe) Mid-range/High-end Specification defined by PCI-SIG Software compatible with PCI and PCI-X Reliable,

More information

EVALUATING INFINIBAND PERFORMANCE WITH PCI EXPRESS

EVALUATING INFINIBAND PERFORMANCE WITH PCI EXPRESS EVALUATING INFINIBAND PERFORMANCE WITH PCI EXPRESS INFINIBAND HOST CHANNEL ADAPTERS (HCAS) WITH PCI EXPRESS ACHIEVE 2 TO 3 PERCENT LOWER LATENCY FOR SMALL MESSAGES COMPARED WITH HCAS USING 64-BIT, 133-MHZ

More information

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

16GFC Sets The Pace For Storage Networks

16GFC Sets The Pace For Storage Networks 16GFC Sets The Pace For Storage Networks Scott Kipp Brocade Mark Jones Emulex August 30 th, 2011 To be presented to the Block Storage Track at 1:30 on Monday September 19th 1 Overview What is 16GFC? What

More information