Performance Analysis and Evaluation of LANL s PaScalBB I/O nodes using Quad-Data-Rate Infiniband and Multiple 10-Gigabit Ethernets Bonding

Size: px
Start display at page:

Download "Performance Analysis and Evaluation of LANL s PaScalBB I/O nodes using Quad-Data-Rate Infiniband and Multiple 10-Gigabit Ethernets Bonding"

Transcription

1 Performance Analysis and Evaluation of LANL s PaScalBB I/O nodes using Quad-Data-Rate Infiniband and Multiple 10-Gigabit Ethernets Bonding Hsing-bugn Chen, Alfred Torrez, Parks Fields HPC-5, Los Alamos National Lab Los Alamos, New Mexico 87111, USA {hbchen, atorrez, parks}@lanlgov Juan C Franco, Daniel Illescas, Rocio Perez-Medina, Jharrod LaFon, Ben Haynes, John Herrera INST-OFF, HPC Summer School Los Alamos National Lab Abstract - In the LANL s PaScalBB network I/O nodes carry data traffic between backend compute nodes and global scratch based file systems An I/O node is normally equipped with one Infiniband Nic for backend traffic and one or more 10-Gigabit Ethernet Nics for parallel file system data traffic With the growing deployment of multiple, multi-core processors in server and storage systems, overall platform efficiency and CPU and memory utilization depends increasingly on interconnect bandwidth and latency PCI- Express (PCIe) generation 20 has recently become available and has doubled the transfer rates available This additional I/O bandwidth balances the system and makes higher data rates for external interconnects such as Infiniband feasible As a result, Infiniband Quad-Data Rate (QDR) mode has become available on the Infiniband Host Channel Adapter (HCA) with a 40 Gb/sec signaling rate Combining HCA QDR data rates with multiple 10-Gigabit Ethernet links and using it in an IO node has created the potential to solve some of the I/O traffic bottlenecks that currently exist We have setup a small-scale PaScalBB testbed and conduct a sequence of I/O node performance tests The goal of this I/O node performance testing is to figure out an enhanced network configuration that we can apply to the LANL s Cielo machine and future LANL HPC machines using PaScalBB architecture Keywords- Server I/O networking, High Performance Networking, Infiniband, 10 Gigabit Ethernet, Link aggregation, Load balancing 1 INTRODUCTION Commercial off the shelf based cluster computing Systems have delivered reasonable performance to technical and commercial areas for years High speed computing, global storage, and networking (IPC and I/O) are the three most critical elements to build a large scale HPC cluster system Without these three elements being well balanced, we cannot fully utilize a HPC cluster High data bandwidth I/O networking provides a data super-highway to meet the needs of constantly increasing computation power and storage capacity LANL s PaScalBB server I/O architecture is designed to support data-intensive scientific applications running on very large-scale clusters The main goal of PaScalBB is to provide high performance, efficient, reliable, parallel, and scalable I/O capabilities for data-intensive scientific applications running on very large-scale clusters Data-intensive scientific simulation-based analysis normally requires efficient transfer of a huge volume of complex data among simulation, visualization, and data manipulation functions To date PaScalBB has been implemented on most of HPC production machines at LANL; Roadrunner (1 st Petaflops machine), RedTail, LOBO, Turing, TLCC, etc I/O nodes are used in the LANL s PaScalBB network to carry data traffic between backend compute nodes and global scratch based file systems An I/O node is normally equipped with one Infiniband NIC for backend IPC traffic and one or more 10-Gigabit Ethernet NICs for parallel file system data traffic With the growing deployment of multiple, multi-core processors in server and storage systems, overall platform efficiency and CPU and memory utilization depends increasingly on interconnect bandwidth and latency PCI- Express (PCIe) generation 20 has recently become available and has doubled the transfer rates available This additional I/O bandwidth balances the system and makes higher data rates for external interconnects such as Infiniband feasible As a result, Infiniband Quad-Data Rate (QDR) mode has become available on the Infiniband Host Channel Adapter (HCA) with a 40 Gb/sec signaling rate Combining HCA QDR rates with multiple 10-Gigabit IPC Ethernet links has the potential to solve some of the I/O traffic bottlenecks that currently exist We have setup a small-scale PaScalBB test bed and conduct a sequence of I/O node performance tests The goal of this I/O node performance testing is to figure out an enhanced network configuration that we can apply to the LANL s Cielo machine and future LANL HPC machines using PaScalBB architecture The rest of this paper is organized as follows In section two we describe LANL s PaScalBB server I/O infrastructure Section three introduces Infiniband/QDR and 10Gigabit Ethernet technologies We then illustrate our experimental setup and discuss testing results and performance data in section four Finally, we present our conclusion and future works in section five 2 PASCALBB SERVER I/O BACKBONE ARCHITECTURE LANL s PaScalBB [10] adopts several hardware and software components to provide a unique and scalable server This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

2 I/O networking architecture Figure-1 illustrates the system components used in PaScalBB 21 Hardware Components used in PaScalBB 211 Level-1 High Speed Interconnection Network The Level-1 interconnect uses (a) high speed interconnect systems such as Quadrics, Myrinet, or Infiniband for fulfilling requirements of low latency, high speed, high bandwidth cluster IPC communication and (b) aggregating I/O-Aware multi-path routes for load-balancing and failover 212 Level 2 IP based Interconnection Network The Level-2 interconnect uses multiple Gigabit Ethernet switches/routers with layer-3 network routing support to provide latency-tolerant I/O communication and global IP based storage systems Without using the Federated network solution, we can linearly expand the Level-2 IP based network by employing a global host domain multicasting feature in metadata servers of a global file system With this support we can maintain a single name space global storage system and provide a linear cost growing path for I/O networking 213 Compute node A Compute node is equipped with at least one high-speed interface card connected to a high-speed interconnect fabric in Level-1 The node is setup with Linux multi-path equalized routing to multiple available I/O nodes for load balancing and failover (high availability) A Compute node is used for computing only and is not involved with any routing activities 214 I/O node I/O node: An I/O routing node has two network interfaces One high-speed interface card is connected to the Level-1 network for communication with Compute nodes One or more Gigabit Ethernet interface cards (bondable) are connected to the Level-2 linear scaling Gigabit switches I/O nodes serve as the routing gateways between Level-1 and Level-2 network Every I/O has the same networking capability 22 System Software Components used in PaScalBB 221 Equal Cost Multi-path routing for load balancing Multi-path routing is used to provide balanced outbound traffic to the multiple I/O gateways It also supports failover and dead-gateway detection capability for choosing good routes from active I/O gateways Linux Multi-Path routing is a destination address-based load-balancing algorithm Multipath routing should improve system performance through load balancing and reduce end-to-end delay Multi-path routing overcomes the capacity constraint of single-path routing and routes through less congested paths Each Compute node is setup with N-ways multi-path routes thru N I/O nodes Multi-path routing also balances the bandwidth gap between the Level-1 and the Level-2 interconnects We use the Equal Cost Multi-path (ECMP) routing strategy on compute nodes so compute nodes can evenly distribute traffic workloads on all I/O nodes With this bi-directional multi-path routing, we can sustain parallel data paths for both write (outbound) and read (inbound) data transfer This is especially useful when applied to concurrent socket I/O sessions on IP based storage systems PaScalBB can evenly allocate socket I/O sessions to routing available I/O routing nodes I/O nodes are used heavily in the LANL s PaScalBB network to carry data traffic between backend compute nodes and global scratch based file systems An I/O node is normally equipped with one Infiniband NIC for backend IPC traffic and one or more 10-Gigabit Ethernet NICs for parallel file system data traffic [6][7][8] 3 INFINIBAND AND 10 GIGABIT ETHERNET Infiniband [3] is a standard switched fabric communication link used in high performance computing and enterprise data centers The InfiniBand Architecture (IBA) is designed to provide high bandwidth, low-latency computing; the scalability to support thousands of nodes and multiple processor cores per server; and efficient utilization of compute processing resources The TOP-500 list published in November 2010 shows that more than 42% of the computing systems use Infiniband as their primary high-speed interconnecting network The growth rate of Infiniband in the TOP-500 systems is about 30% This is an indication of a strong momentum in adoption of Infiniband technology in HPC and Enterprise communities Ethernet has long been the dominant LAN technology Now the availability of 10-Gigabit Ethernet has enabled new applications in the data center and IP based storage systems Because 10-Gigabit Ethernet is based on the core Ethernet technology, it takes advantage of the wealth of improvement that has been developed over the years and simplifies the migration to this higher-speed technology With the growing deployment of multiple, multi-core processors in server and storage systems, overall platform efficiency and CPU and memory utilization depends increasingly on interconnect bandwidth and latency PCI- Express (PCIe) generation 20 has recently become available and has doubled the transfer rates available This additional I/O bandwidth balances the system and makes higher data rates for external interconnects such as Infiniband feasible As a result, Infiniband Quad-Data Rate (QDR) mode has become available on the Infiniband Host Channel Adapter (HCA) with a 40 Gb/sec signaling rate Combining Infiniband HCA QDR data rates with multiple 10-Gigabit Ethernet links and using it in IO node nodes has created the potential to solve some of the I/O traffic bottlenecks that currently exist in HPC machines 4 EXPERIMENTAL TESTING SETUP AND PERFORMANCE EVALUATION We setup a small-scale PaScalBB test bed and conduct a sequence of I/O node performance tests This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

3 41 Testing setup and configuration Hardware equipment includes (a) Twelve Linux server machine Intel Nehalem 5600 DualQuad-core with 16GB DDR3 memory: seven Compute nodes with one Mellanox ConnectX Infiniband QDR on each compute node, one I/O node with Mellanox ConnectX Infiniband QDR [10] and multiple Mellanox ConnectX 10-Gigabit Ethernet Nics, and four data nodes with one 10-Gigabit Ethernet connection on each node, (b) One Mellanox 36-port Infiniband QDR switch, and (c) One Arista 24-port 10-Gigabit Ethernet Switch [11] Software components include (a) Fedora 12/Linux64-bit OS, (b) OFED (OpenFabrics Enterprise Distribution) [9] Infiniband/10Gigabit Ethernet system software, (c) Linux Ethernet bonding driver, and (d) netperf [12] - a network performance benchmark software 42 Performance testing and evaluation 421 Infiniband SDR/DDR/QDR performance testing Figure-2 shows the one-way communications from IB/SDR(single data rate), IB/DDR(double data rate) and IB/QDR(quad data rate) This figure illustrates the improvement of 75% of bi-directional bandwidth when moving from DDR to QDR Figure-3 shows the latency testing results from IB/SDR IB/DDR, and IB/QDR This result demonstrates the advantage of using QDR in terms of lower latency Figure- 4 shows that MPI I/O testing using various message packet sizes from 1MB to 200MB This result shows that IB/QDR can persistently provide consistent bandwidth when various message sizes are applied in MPI applications Figure-5 shows the results of (a) QDR/UC (unreliable connection) one way communication bandwidth (b) QDR/RC (reliable connection) one way communicaiton bandwidth, and (c) QDR/SRQ(shared receiving queue) bi-direction communication bandwidth We can see that IB/QDR can reach a peak of 5600MB+/sec bi-directional bandwidth from multiple streams of netperf testing Gigabit Ethernet performance testing Figure-6 shows the performance results for back-to-back connection using one single 10-Gigabit Ethernet link between two server nodes We can reach 95% bandwidth of a physical 10-Gigable link Figure-7 shows the performance result from triple 10-Gigabit Ethernet bounding back-to-back connection This figure illustrates that we can reach a peak 2300MB/sec bandwidth from three-10gige link bounding Figure-8 shows the performance result from quad 10-Gigabit Ethernet bounding back-to-back connection It only improve 5% -10% bandwidth compared it with the three-10-gigabit Ethernet bounding It may be due to the Ethernet chip-set processing capability or the Linux TCP/IP software stack 423 I/O node performance testing and justification Figure-9 shows the results of using four compute nodes and sending concurrent multiple streams of netperf data traffic through one I/O node and arriving at four different data nodes Data includes four individual links, data bandwidth, and the accumulated data bandwidth It can reach about 2950MB/sec Figure-10 shows the result of using seven compute nodes We can push the bandwidth to 4100MB/sec Figure-9 and Figure- 10 prove that we can gain more bandwidth when more compute nodes are involved in sending networking traffics This also demonstrates the scaling capability of using the LANL s PaScalBB server I/O infrastructure In Figure-11, we verify the advantage of using Linux Ethernet bonding capability We try two Ethernet bonding algorithm implemented in Linux Kernel: mode-0 and mode-5 Linux Ethernet bonding algorithm mode-0, named balancerr or Round-robin policy It transmits data packets in sequential order from the first available slave through the last This mode provides load balancing and fault tolerance Linux Ethernet bonding algorithm mode-5, named balance-tlb or Adaptive transmits load balancing It supports channel/port bonding that does not require any special switch support The outgoing data traffic is well distributed according to the current load on each slave link In-coming data traffic is received by the current slave link If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave The purpose of this testing is to figure out a better traffic load balancing algorithm that can accommodate the advantage of parallel file systems used in HPC machines Our results show that mode-5 (Adaptive transmit load balancing) can obtain 10%-15% more bandwidth compared with mode-0 (a simple Round-robin policy) From the above results, we can conclude that there is definitely an advantage of using multiple 10-gigabits Ethernet bonding in an I/O node when transferring data through an IB/QDR link We also learn how to tune 10-Gigabit Ethernet bonding algorithms to come out with the best fit for HPC parallel file system such as the Paransas Panfs ActiverScale Parallel File storage system 5 CONCLUSIONS AND FUTURE WORKS We evaluate the bandwidth performance of using IB/SDR/, IN/QDR, and IB/QDR We also evaluate of various bonding algorithms of using multiple 10-Gigabie Ethernet links We verify the capability of an I/O node equipped with one IB/QDR and multiple 10-Gigabit Ethernet links We study the Linux Ethernet bonding algorithms We observe the scaling capability of an I/O when it handling more network traffics We figure out a better way of network setup and configuration for LANL s PaScalBB network We have applied our testing results to LANL s production machines As part of the future works, we intend to conduct evaluations on larger test beds, possibly using some available production HPC machines, and studying the impact of new PaScalBB network setups and configuration We also intend to carry more in-depth studies of applying different network This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

4 benchmarking testing, MPI-IO testing, and parallel file system testing REFERENCES [1] Hari Subramoni, Matthew Koop and Dhabaleswar K Panda, Designing Next generation Clusters: Evaluation of Infiniband DDR/QDR on Intel Computing Platforms, HOTI th IEEE Annual Symposium on High- Performance Interconnects [2] Matthew J Koop, Wei Huang, Karthik Gopalakrishanan, Dhabaleswar K Panda, Performance Analysis and Evaluation of PCIe 20 Quad-Data Rate Infiniband, HOTI th IEEE Annual Symposium on High- Performance Interconnects [3] Infiniband Road map, Infiniband Trace Association, [4] HPC Advisory Coucil Network of Expertise, Interconnect Analysis: 10GigE and infiniband in High Performance Computing, 2009 [5] Munira Hussain, Gilad Shalner, Tong Liu, Onur Celebioglu, Comparing DDR and QDR Infiniband 11 th - generation Dell Poweredge Clusters, DELL Power Solution, 2010 Issue 1 [6] Gary Grider, Hsing-bung Chen, James Nunez, Steve Poole, Rosie Wacha, Parks Fields, Robert Martinez, Paul Martinez, Satsangat Khalsa, PaScal A New Parallel and Scalable Server IO Networking Infrastructure for Supporting Global Storage/File Systems in Large-size Linux Clusters, Proceedings of the 25th IEEE International Performance, Computing, and Communications Conference, 2006 (IPCCC 2006) April 2006 [7] Hsing-bung Chen, Gary Grider, Parks Fields, A Cost- Effective, High Bandwidth Server I/O network Architecture for Cluster Systems, 2007 IEEE IPDPS Conference [8] Hsing-bung Chen, parks Fields, Alfred Torrez, An Intelligent Parallel and Scalable Server I/O Networking Environment for High Performance Cluster Computing Systems, PAPTA 2008 Conference [9] OFED OpenFabrics, [10] Mellanox [11] Arista network - [12] Netperf - Comp Node Comp Node Comp nodes - Outbound N-way load balancing Multi-path routing Level-1 Interconnect network I/O Node I/O Node Switch - Inbound M-way multiple streams Equal Cost Multi-path routing - switch Level-2 Interconnect network Global File System Comp Node I/O Node I/O nodes/vlan use OSPF to route inbound and outbound traffics for Level-1 and Level- 2 networks Figure 1: System diagram LANL s PaScalBB Server I/O architecture This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

5 Figure-2:IB/SDR, IB/DDR, and IB/QDR performance testing Figure-3: IB/SDR, IB/DDR, and IB/QDR latency testing Figure-4: Multithread MPI testing using IB/QDR Figure-5: IB/QDR bi-directional bandwidth testing This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

6 Figure-6: back-to-back one single 10-Gigabie Ethernet testing Figure-7: Three 10Gigabit Ethernet bonding performance testing Figure 8: Four 10Gigabit Ethernet bonding performance testing Figure 9: Using four compute nodes scaling testing Figure 10: Using seven compute nodes scaling testing Figure 11: Linux bonding mode-0 vs mode-5 testing This work was carried out under the auspices of the National Nuclear Security Administration of the US Department of Energy at Los Alamos National

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State

More information

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of

More information

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

HYCOM Performance Benchmark and Profiling

HYCOM Performance Benchmark and Profiling HYCOM Performance Benchmark and Profiling Jan 2011 Acknowledgment: - The DoD High Performance Computing Modernization Program Note The following research was performed under the HPC Advisory Council activities

More information

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand

MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The

More information

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments

LS-DYNA Productivity and Power-aware Simulations in Cluster Environments LS-DYNA Productivity and Power-aware Simulations in Cluster Environments Gilad Shainer 1, Tong Liu 1, Jacob Liberman 2, Jeff Layton 2 Onur Celebioglu 2, Scot A. Schultz 3, Joshua Mora 3, David Cownie 3,

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

ABySS Performance Benchmark and Profiling. May 2010

ABySS Performance Benchmark and Profiling. May 2010 ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD) High Performance Interconnects for

More information

MM5 Modeling System Performance Research and Profiling. March 2009

MM5 Modeling System Performance Research and Profiling. March 2009 MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

More information

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09 RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement

More information

ARISTA: Improving Application Performance While Reducing Complexity

ARISTA: Improving Application Performance While Reducing Complexity ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3

More information

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to

More information

High Performance MPI on IBM 12x InfiniBand Architecture

High Performance MPI on IBM 12x InfiniBand Architecture High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction

More information

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K. In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department

More information

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,

More information

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems

More information

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June

More information

Future Routing Schemes in Petascale clusters

Future Routing Schemes in Petascale clusters Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract

More information

Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand

Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand th IEEE Symposium on High Performance Interconnects Performance Analysis and Evaluation of PCIe. and Quad-Data Rate InfiniBand Matthew J. Koop Wei Huang Karthik Gopalakrishnan Dhabaleswar K. Panda Network-Based

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

GROMACS Performance Benchmark and Profiling. August 2011

GROMACS Performance Benchmark and Profiling. August 2011 GROMACS Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011 CESM (Community Earth System Model) Performance Benchmark and Profiling August 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

STAR-CCM+ Performance Benchmark and Profiling. July 2014

STAR-CCM+ Performance Benchmark and Profiling. July 2014 STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Maximizing NFS Scalability

Maximizing NFS Scalability Maximizing NFS Scalability on Dell Servers and Storage in High-Performance Computing Environments Popular because of its maturity and ease of use, the Network File System (NFS) can be used in high-performance

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,

More information

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory

More information

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer

More information

Memcached Design on High Performance RDMA Capable Interconnects

Memcached Design on High Performance RDMA Capable Interconnects Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi- ur- Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan

More information

NIC TEAMING IEEE 802.3ad

NIC TEAMING IEEE 802.3ad WHITE PAPER NIC TEAMING IEEE 802.3ad NIC Teaming IEEE 802.3ad Summary This tech note describes the NIC (Network Interface Card) teaming capabilities of VMware ESX Server 2 including its benefits, performance

More information

ICON Performance Benchmark and Profiling. March 2012

ICON Performance Benchmark and Profiling. March 2012 ICON Performance Benchmark and Profiling March 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource - HPC

More information

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays

Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays This whitepaper describes Dell Microsoft SQL Server Fast Track reference architecture configurations

More information

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng

More information

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G

10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G 10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures

More information

AMBER 11 Performance Benchmark and Profiling. July 2011

AMBER 11 Performance Benchmark and Profiling. July 2011 AMBER 11 Performance Benchmark and Profiling July 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

AcuSolve Performance Benchmark and Profiling. October 2011

AcuSolve Performance Benchmark and Profiling. October 2011 AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute

More information

NAMD Performance Benchmark and Profiling. November 2010

NAMD Performance Benchmark and Profiling. November 2010 NAMD Performance Benchmark and Profiling November 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox Compute resource - HPC Advisory

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Infiniband and RDMA Technology. Doug Ledford

Infiniband and RDMA Technology. Doug Ledford Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

GROMACS Performance Benchmark and Profiling. September 2012

GROMACS Performance Benchmark and Profiling. September 2012 GROMACS Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource

More information

The NE010 iwarp Adapter

The NE010 iwarp Adapter The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter

More information

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters

Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters G.Santhanaraman, T. Gangadharappa, S.Narravula, A.Mamidala and D.K.Panda Presented by:

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet

Performance Evaluation of Soft RoCE over 1 Gigabit Ethernet IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 7-66, p- ISSN: 7-77Volume 5, Issue (Nov. - Dec. 3), PP -7 Performance Evaluation of over Gigabit Gurkirat Kaur, Manoj Kumar, Manju Bala Department

More information

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC 2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

NAMD GPU Performance Benchmark. March 2011

NAMD GPU Performance Benchmark. March 2011 NAMD GPU Performance Benchmark March 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Dell, Intel, Mellanox Compute resource - HPC Advisory

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu

More information

NAMD Performance Benchmark and Profiling. January 2015

NAMD Performance Benchmark and Profiling. January 2015 NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage

Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage Dell Reference Configuration for Large Oracle Database Deployments on Dell EqualLogic Storage Database Solutions Engineering By Raghunatha M, Ravi Ramappa Dell Product Group October 2009 Executive Summary

More information

FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE

FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14

More information

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0

Informatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0 INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

Performance Evaluation of InfiniBand with PCI Express

Performance Evaluation of InfiniBand with PCI Express Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar

More information

NAMD Performance Benchmark and Profiling. February 2012

NAMD Performance Benchmark and Profiling. February 2012 NAMD Performance Benchmark and Profiling February 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource -

More information

High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT

High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

Memory Management Strategies for Data Serving with RDMA

Memory Management Strategies for Data Serving with RDMA Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands

More information

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC To Infiniband or Not Infiniband, One Site s s Perspective Steve Woods MCNC 1 Agenda Infiniband background Current configuration Base Performance Application performance experience Future Conclusions 2

More information

Enhancing Checkpoint Performance with Staging IO & SSD

Enhancing Checkpoint Performance with Staging IO & SSD Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and

More information

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used

More information

NEMO Performance Benchmark and Profiling. May 2011

NEMO Performance Benchmark and Profiling. May 2011 NEMO Performance Benchmark and Profiling May 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook)

The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Workshop on New Visions for Large-Scale Networks: Research & Applications Vienna, VA, USA, March 12-14, 2001 The Future of High-Performance Networking (The 5?, 10?, 15? Year Outlook) Wu-chun Feng feng@lanl.gov

More information

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI Mellanox Technologies Inc. Motti Beck, Director Marketing Motti@mellanox.com Topics Introduction to Mellanox Technologies Inc. Why Cloud SLA

More information

LS-DYNA Performance Benchmark and Profiling. April 2015

LS-DYNA Performance Benchmark and Profiling. April 2015 LS-DYNA Performance Benchmark and Profiling April 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

LAMMPS Performance Benchmark and Profiling. July 2012

LAMMPS Performance Benchmark and Profiling. July 2012 LAMMPS Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC

More information

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011 DB2 purescale: High Performance with High-Speed Fabrics Author: Steve Rees Date: April 5, 2011 www.openfabrics.org IBM 2011 Copyright 1 Agenda Quick DB2 purescale recap DB2 purescale comes to Linux DB2

More information

Four-Socket Server Consolidation Using SQL Server 2008

Four-Socket Server Consolidation Using SQL Server 2008 Four-Socket Server Consolidation Using SQL Server 28 A Dell Technical White Paper Authors Raghunatha M Leena Basanthi K Executive Summary Businesses of all sizes often face challenges with legacy hardware

More information

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices

Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O devices Ryousei Takano, Hidemoto Nakada, Takahiro Hirofuchi, Yoshio Tanaka, and Tomohiro Kudoh Information Technology Research

More information

Delivering HPC Performance at Scale

Delivering HPC Performance at Scale Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design

More information

The Optimal CPU and Interconnect for an HPC Cluster

The Optimal CPU and Interconnect for an HPC Cluster 5. LS-DYNA Anwenderforum, Ulm 2006 Cluster / High Performance Computing I The Optimal CPU and Interconnect for an HPC Cluster Andreas Koch Transtec AG, Tübingen, Deutschland F - I - 15 Cluster / High Performance

More information

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011

The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 The Impact of Inter-node Latency versus Intra-node Latency on HPC Applications The 23 rd IASTED International Conference on PDCS 2011 HPC Scale Working Group, Dec 2011 Gilad Shainer, Pak Lui, Tong Liu,

More information

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch PERFORMANCE BENCHMARKS Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch Chelsio Communications www.chelsio.com sales@chelsio.com +1-408-962-3600 Executive Summary Ethernet provides a reliable

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES

ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES ANALYSIS OF CLUSTER INTERCONNECTION NETWORK TOPOLOGIES Sergio N. Zapata, David H. Williams and Patricia A. Nava Department of Electrical and Computer Engineering The University of Texas at El Paso El Paso,

More information

Advanced RDMA-based Admission Control for Modern Data-Centers

Advanced RDMA-based Admission Control for Modern Data-Centers Advanced RDMA-based Admission Control for Modern Data-Centers Ping Lai Sundeep Narravula Karthikeyan Vaidyanathan Dhabaleswar. K. Panda Computer Science & Engineering Department Ohio State University Outline

More information

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access

More information

Analytics of Wide-Area Lustre Throughput Using LNet Routers

Analytics of Wide-Area Lustre Throughput Using LNet Routers Analytics of Wide-Area Throughput Using LNet Routers Nagi Rao, Neena Imam, Jesse Hanley, Sarp Oral Oak Ridge National Laboratory User Group Conference LUG 2018 April 24-26, 2018 Argonne National Laboratory

More information

CP2K Performance Benchmark and Profiling. April 2011

CP2K Performance Benchmark and Profiling. April 2011 CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox

More information

Altair RADIOSS Performance Benchmark and Profiling. May 2013

Altair RADIOSS Performance Benchmark and Profiling. May 2013 Altair RADIOSS Performance Benchmark and Profiling May 2013 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Altair, AMD, Dell, Mellanox Compute

More information

The Future of Interconnect Technology

The Future of Interconnect Technology The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies

More information

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays Dell EqualLogic Best Practices Series Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays A Dell Technical Whitepaper Jerry Daugherty Storage Infrastructure

More information

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test

More information