When evaluating choices between

Size: px
Start display at page:

Download "When evaluating choices between"

Transcription

1 Fabric Bandwidth Comparisons on Backplanes Open Backplane Fabric Choice Calls for Careful Analyses There s a host of factors to consider when evaluating the system of various Open fabrics. A detailed comparison of Gbit Ethernet, Serial Rapid IO and InfiniBand sheds some light. Peter Thompson, Director of Applications, Military and Aerospace GE Intelligent Platforms When evaluating choices between interconnect fabrics and topologies as part of a systems engineering exercise, there are many factors to be considered. Papers have been published that purport to illustrate the advantages of some schemes over others. However, some of these analyses adopt a simplistic model of the architectures that can be misleading when it comes to mapping a real-world problem onto such systems. A more rigorous approach to the analysis is needed in order to derive metrics that are more meaningful and which differ significantly from those derived from a simplistic analytical approach. This article compares the available to two common types of dataflow for systems based on the VITA 65 CEN central switched topology, using three different fabrics Serial RapidIO (), Gigabit Ethernet () and Double Rate InfiniBand (DDR IB). The analysis will show that the difference in routing for the three fabrics on the CEN backplane is imal, and that for the use cases presented, is closer in performance to than is claimed elsewhere, and that DDR IB matches. The System Architecture For the purposes of this analysis, consider an Open CEN backplane that allows for payload (processor) slots and two switch slots (Figure 1). This is a non-uniform topology that routes one connection from each payload slot to each of the two switch slots, plus one connection to the adjacent slot on each side. First consider a system built from boards that each have two processing nodes (which may be multicore), where each node is connected via Gen2 Serial RapidIO () to an onboard switch, which in turn has four connections to the backplane. Each processor has two connections. That is then compared with a similar system based around dual processor boards with two links per node and no onboard switch. If, as in the case of the GE DSP280 multiprocessor board, a Mellanox network interface chip is used, the same board can be software reconfigured to support InfiniBand: In fact, the only change required to migrate from to DDR InfiniBand is to change the system central switch from (for example, GE s GBX460) to InfiniBand (such as GE s IBX400). The backplane can remain unchanged, as can the payload boards. The interfaces can be software selected as or IB. By using appropriate middleware such as AXIS or MPI, the application code remains agnostic to the fabric. Assumptions and Rate Arithmetic It is assumed that the switches for all three fabrics are non-blocking for the number of ports that each switch chip supports. However, as will be seen, the number of chips and the hierarchy used to construct a 20-port central switch board can have a significant impact on the true network topology and therefore the available to an application. One factor that can be overlooked is that in addition to the primary data fabric connections, there can be an alternate path between nodes on the same board that can be seamlessly leveraged by the application. For example, GE s

2 Slots / Management Slots Slot numbers are logical, physical slot numbers may be different sion (DFP) } FP (FP) } FP TP (UTP) UTP Management (IPMB) IPMC IPMC IPMC IPMC ChMC ChMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC Utility Includes Power Figure 1 An Open CEN backplane that allows for payload (processor) slots and two switch slots. This is a non-uniform topology that routes one connection from each payload slot to each of the two switch slots, plus one connection to the adjacent slot on each side. DSP280 multiprocessor board has eight lanes of PCIe Gen2 between the two processors via a switch with non-transparent bridging capability. This adds a path with up to 32 Gbit/s available. It s important that the inter-processor communication software is able to leverage mixed data paths within a system. The AXIS tools from GE can do that and can be used to build a dataflow model that represents the algorithm s needs, and the user has complete control over which interconnect mechanism is used for each data link. Gen2 (which is only just starting to emerge on products as of early 20) runs at 5 GHz with the chipsets commonly in use. A 4-lane connection, with the overhead of 8bb encoding, yields a raw rate of 4 x 5 Gbit/s x 0.8 = Gbit/s. clocks at 3.5 GHz on 4 lanes with the same 8bB encoding, so has a raw rate of 4 x 3.5 Gbit/s x 0.8 = Gbit/s. DDR InfiniBand clocks at 5 GHz, with a raw rate of 4 x 5 Gbit/s x 0.8 = Gbit/s. Mellanox interface chips that support both and IB have been available and have been deployed for some time now, and are considered a mature technology with widespread adoption in mainstream high-performance computing. ded System Architecture Now consider a system built from fourteen such boards in an Open chassis with a backplane that conforms to the BKP6-CEN-.2.2.n profile. This supports fourteen payload boards and two central switch boards, and yields a noal interconnect diagram as shown in Figure 2 for the case. For or InfiniBand, the same backplane results in an inter-connect mapping that is represented in Figure 3. Those diagrams do not tell the whole story however. They would be correct if the central switches shown were constructed from a single, non-blocking, 18- to 20-port switch device. However, this is not the case for all the fabrics. In the case, a GBX460 switch card can be used, which employs a single 24-port switch chip. For an InfiniBand system, IBX400 can be used, which has a single, 36-port switch chip where each port is x4 lanes wide. In the case of Gen2, the switch chip commonly selected is a device that supports 48 lanes in other words ports of x4 links. In order to construct a switch of higher order, it is necessary to use several chips in some kind of a tree structure. Here a tradeoff must be made of the number of chips used against the overall performance of the aggregated switch. All-to-All Measurement When evaluating network architectures, a common approach is to look Reprinted from June 20 COTS Journal

3 Figure 2 This interconnect diagram for the Serial RapidIO use case has an Open backplane that conforms to the BKP6-CEN-.2.2.n profile. This supports payload boards and two central switch boards. at an all-to-all exchange of data. This is of interest as it represents a common problem encountered in embedded processing systems: a distributed corner turn of matrix data. This is a core function in synthetic aperture radars, for instance, where it is termed a corner turn. It is commonly seen when the processing algorithm calls for a two (or higher) dimensional array to be subjected to a two dimensional (or higher) Fast Fourier Transform. In order to meet system time constraints, the transform is often distributed across many processor nodes. Between the row FFTs and the column FFTs the data must be exchanged between nodes. This requires an all-to-all exchange of data that can tax the available of a system. A simple analysis of this topology might make the following assumptions: there are links between nodes on each board via the onboard switch, there are links to nodes on adjacent cards via links between the onboard switches, and there are 22 connections made via the central switches. In this approach, the overall performance for an all-to-all exchange might be assumed to be detered by the lowest aggregate of these three connection types in other words that of a single link divided by the number of connections. This equates to 4 lanes x 5 Gbit/s x 0.8 encoding / 22 nodes = 0.73 Gbit/s. If we apply the same simplistic analysis to the system, this suggests that the available for all-to-all transfers is 4 lanes x 3.5 Gbit/s x 0.8 encoding of x8 connections between switches / 368 paths = Gbit/s per when using. That means has an apparent speed advantage of 3.4 to 1. However, this is a flawed analysis and gives a misleading impression as to the relative performance that might be expected from the two systems when doing a corner turn. The two architectures are evaluated with different methods one by dividing the worst-case by the number of processors sharing it, and the other by dividing the worst-case by the number of links that share it.

4 Figure 3 This interconnect diagram has an Open backplane with an interconnect mapping for or InfiniBand. Architecture Matters A second potential error is to ignore the internal architecture of each switch device, as this can have an effect in cases where the switch does not have balanced. However, the biggest flaw is the suggestion that the performance of a non-uniform tree architecture can be modeled by deriving the lowest connection in the system. In network theory, it is widely accepted that the best metric for the expected performance of such a system is represented by the of the network. The of a system is found by dividing the system into two equal halves along a dividing line, and enumerating the rate at which data can be communicated between the two halves. Reconsidering the network diagram of the system, the bisection width is defined by the number of paths that the median line crosses, which adds up to be 1. Similarly, the width of the or DDR IB system would add up to be 1. Given that the link for the system is Gbit/s and for is Gbit/s, the of the system is 1 x = 304 Gbit/s, and for the system it is 1 x = 10 Gbit/s. This represents an expected performance ratio for the total exchange scenario of 1.6 to 1 in favor of the system not the 3.4 to 1 predicted in the simplistic model. If we now replace the switch with an InfiniBand switch, which fits the same slot and backplane profiles, the is 1 x = 304 Gbit/s. Therefore the performance of DDR InfiniBand matches that of. Bandwidth Calculations Pipeline Case Another dataflow model commonly considered is a pipeline, where data streams from node to node in a linear manner. When designing such a dataflow, it is normal to map the tasks and flow to the system in an optimal manner. This can include using different fabric connections for different parts of the flow. A good IPC library and infrastructure will allow the designer to do so without requiring any modifications to the application code. AXIS has this characteristic. Here, for simplicity, it is assumed that the input and output data sizes at each processing stage are the same (no data reduction or increase). In this instance the rate of the slowest link in the chain dictates the overall achievable performance. Reprinted from June 20 COTS Journal

5 1 to 2 Figure If Task 1 is mapped to 1, Task 2 to 2 and Task 3 to 3, the available paths are shown in yellow in Figure 4 for the system. The path from Task 1 to Task 2 is over x8 PCIe Gen2, with an available of 32 Gbit/ ss. The path from Task 2 to Task 3 has access to two links, an aggregate rate of 20 Gbit/s. Therefore the imum path is 20 Gbit/s. In the DDR IB system, the path from Task 2 to Task 3 has access to two IB links, an aggregate rate of 32 Gbit/s. The PCIe link is unchanged, so the imum leg here is 32 Gbit/s. Now, for the system, with paths between 1 and 2 and between 2 and 3, two separate links are available, so 32 Gbit/s is available for both legs. 2 to 3 2 to 3 Shown here is a pipeline dataflow scheme mapped to a system. Backplane Use Case SR DDR 1B : IB: CEN payload 2 switch All-to-all 10 Gbits/s 304 Gbits/s 304 Gbits/s 1.6x 1x CEN payload 2 switch Figure 5 Pipeline 20 Gbits/s 32 Gbits/s 32 Gbits/s 1.6x 1x The table summarizes the system analyses for the, and DDR InfiniBand systems. The DDR InfiniBand system matches the performance of the system for both use cases. The result of all this is that the limiting s for the pipeline use case are 20 Gbit/s for, 32 Gbit/s for DDR IB and 32 Gbit/s for. Other Factors to Consider The push to support open software architectures MOSA, FACE and so on is leading the military embedded processing industry to support middleware packages such as Open Fabric Enterprise Distribution (OFED) and OpenMPI for data movement. Typically OpenMPI is layered over a network stack, and its performance is highly reliant on the efficiency of how the layers map to the underlying fabric. Some implementations rely on rionet, a Linux network driver that presents a TCP/IP interface to. Contrast this with an OpenMPI implementation that maps through OFED to RDMA over or InfiniBand, and it can be seen that the potential exists for a large gap in performance at the application level, with RDMA being much more efficient. Meanwhile, it is sometimes claimed that is more power efficient than the other fabrics. If we total up the power of the bridge and switch components for each -slot system, a truer picture emerges. If you do the math, the power efficiency of and DDR IB is on par, with fairly close. Differences Not Significant Figure 5 summarizes the system analyses for the, and DDR InfiniBand systems. These show that for both use cases, the simplistic analysis presented elsewhere overestimates the performance advantage of over by a factor of two, and that the advantage is completely attributable to the difference in clock rates. The CEN topology has little to no effect in reality. It also shows that the DDR InfiniBand system matches the performance of the system for both use cases. GE Intelligent Platforms Charlottesville, VA. (800) [

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort Technology White Paper The CHAMP-AV6 VPX-REDI Digital Signal Processing Card Maximizing Performance with Minimal Porting Effort Introduction The Curtiss-Wright Controls Embedded Computing CHAMP-AV6 is

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

RapidIO.org Update. Mar RapidIO.org 1

RapidIO.org Update. Mar RapidIO.org 1 RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org

More information

Introduction to PCI Express Positioning Information

Introduction to PCI Express Positioning Information Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that

More information

RapidIO.org Update.

RapidIO.org Update. RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications

More information

Sub-microsecond interconnects for processor connectivity The opportunity

Sub-microsecond interconnects for processor connectivity The opportunity 1/9 ページ Sub-microsecond interconnects for processor connectivity The opportunity Sam Fuller - May 22, 2013 As Moore s Law has continued to drive the performance and integration of processors ever higher,

More information

1 of 5 10/6/2014 3:22 PM

1 of 5 10/6/2014 3:22 PM 1 of 5 10/6/2014 3:22 PM HOME ARCHIVES RESOURCES ABOUT US ADVERTISE TECH RECON SYSTEM DEVELOPMENT TECHNOLOGY FOCUS SPECIAL FEATURE PRODUCTS PUBLISHER'S NOTEBOOK ALL Search Articles SPECIAL FEATURE KEYWORDS

More information

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09 RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement

More information

RDMA in Embedded Fabrics

RDMA in Embedded Fabrics RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded

More information

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to

More information

Verification and Validation of X-Sim: A Trace-Based Simulator

Verification and Validation of X-Sim: A Trace-Based Simulator http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator

More information

Lowering Cost per Bit With 40G ATCA

Lowering Cost per Bit With 40G ATCA White Paper Lowering Cost per Bit With 40G ATCA Prepared by Simon Stanley Analyst at Large, Heavy Reading www.heavyreading.com sponsored by www.radisys.com June 2012 Executive Summary Expanding network

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

The rcuda middleware and applications

The rcuda middleware and applications The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,

More information

Ensemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module

Ensemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module Ensemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module Next-Generation High Density Processing With I/O in a Single VPX slot OpenVPX The Ensemble 6000 Series OpenVPX HCD6210 High Compute

More information

Choosing the Right COTS Mezzanine Module

Choosing the Right COTS Mezzanine Module Choosing the Right COTS Mezzanine Module Rodger Hosking, Vice President, Pentek One Park Way, Upper Saddle River, New Jersey 07458 Tel: (201) 818-5900 www.pentek.com Open architecture embedded systems

More information

01/21/2014 Charles Patrick Collier Embedded Tech Trends

01/21/2014 Charles Patrick Collier Embedded Tech Trends 01/21/2014 Charles Patrick Collier mbedded Tech Trends NGI Overarching Goals pace Goals Who is Involved pace (VITA 78) What is in pace? With that in Mind Mapping Interfaces to pace lots xample pace (VITA

More information

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments White Paper Implementing RapidIO Travis Scheckel and Sandeep Kumar Communications Infrastructure Group, Texas Instruments In today s telecommunications market, slow and proprietary is not the direction

More information

Embracing Open Technologies in the HPEC Market

Embracing Open Technologies in the HPEC Market Embracing Open Technologies in the HPEC Market Embedded Tech Trends January 2013 Andrew Shieh Director of Engineering CSPI, MultiComputer Division 1 CSPI Company Overview Diversified multinational technology

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

NVM PCIe Networked Flash Storage

NVM PCIe Networked Flash Storage NVM PCIe Networked Flash Storage Peter Onufryk Microsemi Corporation Santa Clara, CA 1 PCI Express (PCIe) Mid-range/High-end Specification defined by PCI-SIG Software compatible with PCI and PCI-X Reliable,

More information

JMR ELECTRONICS INC. WHITE PAPER

JMR ELECTRONICS INC. WHITE PAPER THE NEED FOR SPEED: USING PCI EXPRESS ATTACHED STORAGE FOREWORD The highest performance, expandable, directly attached storage can be achieved at low cost by moving the server or work station s PCI bus

More information

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits

Key Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by

More information

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.

1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. 1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Introduction Electrical Considerations Data Transfer

More information

Creating an agile infrastructure with Virtualized I/O

Creating an agile infrastructure with Virtualized I/O etrading & Market Data Agile infrastructure Telecoms Data Center Grid Creating an agile infrastructure with Virtualized I/O Richard Croucher May 2009 Smart Infrastructure Solutions London New York Singapore

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

Deploying 10 Gigabit Ethernet with Cisco Nexus 5000 Series Switches

Deploying 10 Gigabit Ethernet with Cisco Nexus 5000 Series Switches Deploying 10 Gigabit Ethernet with Cisco Nexus 5000 Series Switches Introduction Today s data centers are being transformed as the most recent server technologies are deployed in them. Multisocket servers

More information

TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016

TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 MULTI GPU PROGRAMMING Node 0 Node 1 Node N-1 MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State

More information

Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation

Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation www.atrenne.com sales@atrenne.com 508.588.6110 800.926.8722 Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation

More information

Ensemble 6000 Series OpenVPX Intel Xeon Dual Quad-Core HDS6600 Module

Ensemble 6000 Series OpenVPX Intel Xeon Dual Quad-Core HDS6600 Module DATASHEET Ensemble 6000 Series OpenVPX Xeon Dual Quad-Core HDS6600 Module Industry-Leading Performance for Rugged Signal Processing 6U OpenVPX -compliant VITA 65/46/48 (VPX-REDI) nodule Two quad-core Xeon

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Messaging Overview. Introduction. Gen-Z Messaging

Messaging Overview. Introduction. Gen-Z Messaging Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional

More information

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC

Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC 2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste

More information

Single-Points of Performance

Single-Points of Performance Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia

More information

OCTOPUS Performance Benchmark and Profiling. June 2015

OCTOPUS Performance Benchmark and Profiling. June 2015 OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

RACEway Interlink Modules

RACEway Interlink Modules RACE++ Series RACEway Interlink Modules 66-MHz RACE++ Switched Interconnect Adaptive Routing More than 2.1 GB/s Bisection Bandwidth Deterministically Low Latency Low Power, Field-Tested Design Flexible

More information

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012 ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information

More information

MVAPICH2 vs. OpenMPI for a Clustering Algorithm

MVAPICH2 vs. OpenMPI for a Clustering Algorithm MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore

More information

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07

More information

Creating High Performance Clusters for Embedded Use

Creating High Performance Clusters for Embedded Use Creating High Performance Clusters for Embedded Use 1 The Hype.. The Internet of Things has the capacity to create huge amounts of data Gartner forecasts 35ZB of data from things by 2020 etc Intel Putting

More information

2008 International ANSYS Conference

2008 International ANSYS Conference 2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,

More information

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007 Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise

More information

SNAP Performance Benchmark and Profiling. April 2014

SNAP Performance Benchmark and Profiling. April 2014 SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting

More information

iwarp Learnings and Best Practices

iwarp Learnings and Best Practices iwarp Learnings and Best Practices Author: Michael Fenn, Penn State Date: March 28, 2012 www.openfabrics.org 1 Introduction Last year, the Research Computing and Cyberinfrastructure group at Penn State

More information

Optimizing LS-DYNA Productivity in Cluster Environments

Optimizing LS-DYNA Productivity in Cluster Environments 10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for

More information

Octopus: A Multi-core implementation

Octopus: A Multi-core implementation Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general

More information

MILC Performance Benchmark and Profiling. April 2013

MILC Performance Benchmark and Profiling. April 2013 MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

Embedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications

Embedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications Embedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications System Essentials: Rugged Software Radio Industry Standard Open Architectures

More information

Birds of a Feather Presentation

Birds of a Feather Presentation Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

OPTIMISING NETWORKED DATA ACQUISITION FOR SMALLER CONFIGURATIONS

OPTIMISING NETWORKED DATA ACQUISITION FOR SMALLER CONFIGURATIONS OPTIMISING NETWORKED DATA ACQUISITION FOR SMALLER CONFIGURATIONS DAVE BUCKLEY ACRA BUSINESS UNIT, CURTISS-WRIGHT CONTROLS AVIONICS & ELECTRONICS ABSTRACT Network switches are a critical component in any

More information

FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE

FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14

More information

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance

More information

Components of a MicroTCA System

Components of a MicroTCA System Micro TCA Overview0 Platform, chassis, backplane, and shelf manager specification, being developed through PICMG Allows AMC modules to plug directly into a backplane Fills the performance/cost gap between

More information

CPU Agnostic Motherboard design with RapidIO Interconnect in Data Center

CPU Agnostic Motherboard design with RapidIO Interconnect in Data Center Agnostic Motherboard design with RapidIO Interconnect in Data Center Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council 2013 RapidIO Trade Association Agenda

More information

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.

More information

Five Ways to Build Flexibility into Industrial Applications with FPGAs

Five Ways to Build Flexibility into Industrial Applications with FPGAs GM/M/A\ANNETTE\2015\06\wp-01154- flexible-industrial.docx Five Ways to Build Flexibility into Industrial Applications with FPGAs by Jason Chiang and Stefano Zammattio, Altera Corporation WP-01154-2.0 White

More information

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch PERFORMANCE BENCHMARKS Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch Chelsio Communications www.chelsio.com sales@chelsio.com +1-408-962-3600 Executive Summary Ethernet provides a reliable

More information

Highly Accurate, Record/ Playback of Digitized Signal Data Serves a Variety of Applications

Highly Accurate, Record/ Playback of Digitized Signal Data Serves a Variety of Applications New Wave Design and Verification Highly Accurate, Record/ Playback of Digitized Signal Data Serves a Variety of Applications Using FPGA-based filtering, precision timestamping and packetinspection, a powerful

More information

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet

Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics

More information

Extending PCI-Express in MicroTCA Platforms. Whitepaper. Joey Maitra of Magma & Tony Romero of Performance Technologies

Extending PCI-Express in MicroTCA Platforms. Whitepaper. Joey Maitra of Magma & Tony Romero of Performance Technologies Extending PCI-Express in MicroTCA Platforms Whitepaper Joey Maitra of Magma & Tony Romero of Performance Technologies Introduction: The introduction of MicroTCA platforms has opened the door for AdvancedMC

More information

Sugon TC6600 blade server

Sugon TC6600 blade server Sugon TC6600 blade server The converged-architecture blade server The TC6600 is a new generation, multi-node and high density blade server with shared power, cooling, networking and management infrastructure

More information

The desire for higher interconnect speeds between

The desire for higher interconnect speeds between Evaluating high speed industry standard serial interconnects By Harpinder S. Matharu The desire for higher interconnect speeds between chips, boards, and chassis continues to grow in order to satisfy the

More information

Low Latency Server Virtualization

Low Latency Server Virtualization Low Latency Server Virtualization Using RapidIO November 29-30, 2011 Server Design Summit Mohammad Akhter Devashish Paul Integrated Device Technology 2010 Integrated Device Technology, Inc. The Analog

More information

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of

More information

CPMD Performance Benchmark and Profiling. February 2014

CPMD Performance Benchmark and Profiling. February 2014 CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting

More information

New! New! New! New! New!

New! New! New! New! New! New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to

More information

GW2000h w/gw175h/q F1 specifications

GW2000h w/gw175h/q F1 specifications Product overview The Gateway GW2000h w/ GW175h/q F1 maximizes computing power and thermal control with up to four hot-pluggable nodes in a space-saving 2U form factor. Offering first-class performance,

More information

Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 10 Devices to Achieve Maximum Power Reduction

Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 10 Devices to Achieve Maximum Power Reduction white paper FPGA Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 1 s to Achieve Maximum Reduction devices leverage the innovative Intel HyperFlex FPGA architecture to achieve power savings

More information

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer

More information

Solaris Engineered Systems

Solaris Engineered Systems Solaris Engineered Systems SPARC SuperCluster Introduction Andy Harrison andy.harrison@oracle.com Engineered Systems, Revenue Product Engineering The following is intended to outline

More information

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about

More information

High Performance Ethernet for Grid & Cluster Applications. Adam Filby Systems Engineer, EMEA

High Performance Ethernet for Grid & Cluster Applications. Adam Filby Systems Engineer, EMEA High Performance Ethernet for Grid & Cluster Applications Adam Filby Systems Engineer, EMEA 1 Agenda Drivers & Applications The Technology Ethernet Everywhere Ethernet as a Cluster interconnect Ethernet

More information

AXIe : AdvancedTCA Extensions for Instrumentation and Test. Autotestcon 2016

AXIe : AdvancedTCA Extensions for Instrumentation and Test. Autotestcon 2016 AXIe : AdvancedTCA Extensions for Instrumentation and Test Autotestcon 2016 Copyright 2016 AXIe Consortium, Inc. * AdvancedTCA is a registered trademark of PICMG. AXIe is a registered trademark of the

More information

FPGA Solutions: Modular Architecture for Peak Performance

FPGA Solutions: Modular Architecture for Peak Performance FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA

More information

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구

MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

The VITA Radio Transport as a Framework for Software Definable Radio Architectures

The VITA Radio Transport as a Framework for Software Definable Radio Architectures The VITA Radio Transport as a Framework for Software Definable Radio Architectures Robert Normoyle (DRS Signal Solutions, Gaithersburg, Md; Robert.Normoyle@DRS-SS.com); and Paul Mesibov (Pentek, Inc. Upper

More information

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its

More information

Module 2 Storage Network Architecture

Module 2 Storage Network Architecture Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,

More information

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016

Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 VNFaaS (Virtual Network Function as a Service) In our present work, we consider the VNFaaS use-case

More information

Fibre Channel over Ethernet and 10GBASE-T: Do More with Less

Fibre Channel over Ethernet and 10GBASE-T: Do More with Less White Paper Fibre Channel over Ethernet and 10GBASE-T: Do More with Less What You Will Learn Over the past decade, data centers have grown both in capacity and capabilities. Moore s Law which essentially

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

Unified Runtime for PGAS and MPI over OFED

Unified Runtime for PGAS and MPI over OFED Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

RoCE vs. iwarp Competitive Analysis

RoCE vs. iwarp Competitive Analysis WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6

More information

Ensemble 6000 Series OpenVPX Intel 4 th Generation Core i7 module LDS6525-CX

Ensemble 6000 Series OpenVPX Intel 4 th Generation Core i7 module LDS6525-CX DATASHEET Ensemble 6000 Series OpenVPX Intel 4 th Generation Core i7 module LDS6525-CX 4 th Gen Intel Quad-Core i7 LDS module with CPU on-die GPU and mezzanine sites 4 th Gen Intel Quad-Core i7 processor

More information

Deep Learning Performance and Cost Evaluation

Deep Learning Performance and Cost Evaluation Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide Flex System IB6132 2-port FDR InfiniBand Adapter Lenovo Press Product Guide The Flex System IB6132 2-port FDR InfiniBand Adapter delivers low latency and high bandwidth for performance-driven server clustering

More information

Intel Xeon Sandy Bridge Server-Class Processor DATASHEET

Intel Xeon Sandy Bridge Server-Class Processor DATASHEET DATASHEET Ensemble 6000 Series OpenVPX, Intel 3 rd Generation Xeon 10-core, Ethernet/Infiniband High Density Server-Class HDS6602 Processing Module Most Powerful, Rugged, Single Slot Intel Server-Class

More information

Parallel Computing: Parallel Architectures Jin, Hai

Parallel Computing: Parallel Architectures Jin, Hai Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer

More information

GPM0001 E9171 GPU-based Processor Module

GPM0001 E9171 GPU-based Processor Module GPM0001 E9171 GPU-based Processor Module DO-254 Certifiable 3U VPX Graphics/Compute Module IP Features and Benefits Part of the COTS-D family of safety certifiable modules A compact GPU Processing Module

More information

Version PEX 8516

Version PEX 8516 Version 1.4 2006 PEX 8516 Not recommended for new designs please use PEX8518 for new designs Version 1.4 2006 Features PEX 8516 General Features o 16-lane PCI Express switch - Integrated SerDes o Up to

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information