When evaluating choices between
|
|
- Allison Barnett
- 5 years ago
- Views:
Transcription
1 Fabric Bandwidth Comparisons on Backplanes Open Backplane Fabric Choice Calls for Careful Analyses There s a host of factors to consider when evaluating the system of various Open fabrics. A detailed comparison of Gbit Ethernet, Serial Rapid IO and InfiniBand sheds some light. Peter Thompson, Director of Applications, Military and Aerospace GE Intelligent Platforms When evaluating choices between interconnect fabrics and topologies as part of a systems engineering exercise, there are many factors to be considered. Papers have been published that purport to illustrate the advantages of some schemes over others. However, some of these analyses adopt a simplistic model of the architectures that can be misleading when it comes to mapping a real-world problem onto such systems. A more rigorous approach to the analysis is needed in order to derive metrics that are more meaningful and which differ significantly from those derived from a simplistic analytical approach. This article compares the available to two common types of dataflow for systems based on the VITA 65 CEN central switched topology, using three different fabrics Serial RapidIO (), Gigabit Ethernet () and Double Rate InfiniBand (DDR IB). The analysis will show that the difference in routing for the three fabrics on the CEN backplane is imal, and that for the use cases presented, is closer in performance to than is claimed elsewhere, and that DDR IB matches. The System Architecture For the purposes of this analysis, consider an Open CEN backplane that allows for payload (processor) slots and two switch slots (Figure 1). This is a non-uniform topology that routes one connection from each payload slot to each of the two switch slots, plus one connection to the adjacent slot on each side. First consider a system built from boards that each have two processing nodes (which may be multicore), where each node is connected via Gen2 Serial RapidIO () to an onboard switch, which in turn has four connections to the backplane. Each processor has two connections. That is then compared with a similar system based around dual processor boards with two links per node and no onboard switch. If, as in the case of the GE DSP280 multiprocessor board, a Mellanox network interface chip is used, the same board can be software reconfigured to support InfiniBand: In fact, the only change required to migrate from to DDR InfiniBand is to change the system central switch from (for example, GE s GBX460) to InfiniBand (such as GE s IBX400). The backplane can remain unchanged, as can the payload boards. The interfaces can be software selected as or IB. By using appropriate middleware such as AXIS or MPI, the application code remains agnostic to the fabric. Assumptions and Rate Arithmetic It is assumed that the switches for all three fabrics are non-blocking for the number of ports that each switch chip supports. However, as will be seen, the number of chips and the hierarchy used to construct a 20-port central switch board can have a significant impact on the true network topology and therefore the available to an application. One factor that can be overlooked is that in addition to the primary data fabric connections, there can be an alternate path between nodes on the same board that can be seamlessly leveraged by the application. For example, GE s
2 Slots / Management Slots Slot numbers are logical, physical slot numbers may be different sion (DFP) } FP (FP) } FP TP (UTP) UTP Management (IPMB) IPMC IPMC IPMC IPMC ChMC ChMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC IPMC Utility Includes Power Figure 1 An Open CEN backplane that allows for payload (processor) slots and two switch slots. This is a non-uniform topology that routes one connection from each payload slot to each of the two switch slots, plus one connection to the adjacent slot on each side. DSP280 multiprocessor board has eight lanes of PCIe Gen2 between the two processors via a switch with non-transparent bridging capability. This adds a path with up to 32 Gbit/s available. It s important that the inter-processor communication software is able to leverage mixed data paths within a system. The AXIS tools from GE can do that and can be used to build a dataflow model that represents the algorithm s needs, and the user has complete control over which interconnect mechanism is used for each data link. Gen2 (which is only just starting to emerge on products as of early 20) runs at 5 GHz with the chipsets commonly in use. A 4-lane connection, with the overhead of 8bb encoding, yields a raw rate of 4 x 5 Gbit/s x 0.8 = Gbit/s. clocks at 3.5 GHz on 4 lanes with the same 8bB encoding, so has a raw rate of 4 x 3.5 Gbit/s x 0.8 = Gbit/s. DDR InfiniBand clocks at 5 GHz, with a raw rate of 4 x 5 Gbit/s x 0.8 = Gbit/s. Mellanox interface chips that support both and IB have been available and have been deployed for some time now, and are considered a mature technology with widespread adoption in mainstream high-performance computing. ded System Architecture Now consider a system built from fourteen such boards in an Open chassis with a backplane that conforms to the BKP6-CEN-.2.2.n profile. This supports fourteen payload boards and two central switch boards, and yields a noal interconnect diagram as shown in Figure 2 for the case. For or InfiniBand, the same backplane results in an inter-connect mapping that is represented in Figure 3. Those diagrams do not tell the whole story however. They would be correct if the central switches shown were constructed from a single, non-blocking, 18- to 20-port switch device. However, this is not the case for all the fabrics. In the case, a GBX460 switch card can be used, which employs a single 24-port switch chip. For an InfiniBand system, IBX400 can be used, which has a single, 36-port switch chip where each port is x4 lanes wide. In the case of Gen2, the switch chip commonly selected is a device that supports 48 lanes in other words ports of x4 links. In order to construct a switch of higher order, it is necessary to use several chips in some kind of a tree structure. Here a tradeoff must be made of the number of chips used against the overall performance of the aggregated switch. All-to-All Measurement When evaluating network architectures, a common approach is to look Reprinted from June 20 COTS Journal
3 Figure 2 This interconnect diagram for the Serial RapidIO use case has an Open backplane that conforms to the BKP6-CEN-.2.2.n profile. This supports payload boards and two central switch boards. at an all-to-all exchange of data. This is of interest as it represents a common problem encountered in embedded processing systems: a distributed corner turn of matrix data. This is a core function in synthetic aperture radars, for instance, where it is termed a corner turn. It is commonly seen when the processing algorithm calls for a two (or higher) dimensional array to be subjected to a two dimensional (or higher) Fast Fourier Transform. In order to meet system time constraints, the transform is often distributed across many processor nodes. Between the row FFTs and the column FFTs the data must be exchanged between nodes. This requires an all-to-all exchange of data that can tax the available of a system. A simple analysis of this topology might make the following assumptions: there are links between nodes on each board via the onboard switch, there are links to nodes on adjacent cards via links between the onboard switches, and there are 22 connections made via the central switches. In this approach, the overall performance for an all-to-all exchange might be assumed to be detered by the lowest aggregate of these three connection types in other words that of a single link divided by the number of connections. This equates to 4 lanes x 5 Gbit/s x 0.8 encoding / 22 nodes = 0.73 Gbit/s. If we apply the same simplistic analysis to the system, this suggests that the available for all-to-all transfers is 4 lanes x 3.5 Gbit/s x 0.8 encoding of x8 connections between switches / 368 paths = Gbit/s per when using. That means has an apparent speed advantage of 3.4 to 1. However, this is a flawed analysis and gives a misleading impression as to the relative performance that might be expected from the two systems when doing a corner turn. The two architectures are evaluated with different methods one by dividing the worst-case by the number of processors sharing it, and the other by dividing the worst-case by the number of links that share it.
4 Figure 3 This interconnect diagram has an Open backplane with an interconnect mapping for or InfiniBand. Architecture Matters A second potential error is to ignore the internal architecture of each switch device, as this can have an effect in cases where the switch does not have balanced. However, the biggest flaw is the suggestion that the performance of a non-uniform tree architecture can be modeled by deriving the lowest connection in the system. In network theory, it is widely accepted that the best metric for the expected performance of such a system is represented by the of the network. The of a system is found by dividing the system into two equal halves along a dividing line, and enumerating the rate at which data can be communicated between the two halves. Reconsidering the network diagram of the system, the bisection width is defined by the number of paths that the median line crosses, which adds up to be 1. Similarly, the width of the or DDR IB system would add up to be 1. Given that the link for the system is Gbit/s and for is Gbit/s, the of the system is 1 x = 304 Gbit/s, and for the system it is 1 x = 10 Gbit/s. This represents an expected performance ratio for the total exchange scenario of 1.6 to 1 in favor of the system not the 3.4 to 1 predicted in the simplistic model. If we now replace the switch with an InfiniBand switch, which fits the same slot and backplane profiles, the is 1 x = 304 Gbit/s. Therefore the performance of DDR InfiniBand matches that of. Bandwidth Calculations Pipeline Case Another dataflow model commonly considered is a pipeline, where data streams from node to node in a linear manner. When designing such a dataflow, it is normal to map the tasks and flow to the system in an optimal manner. This can include using different fabric connections for different parts of the flow. A good IPC library and infrastructure will allow the designer to do so without requiring any modifications to the application code. AXIS has this characteristic. Here, for simplicity, it is assumed that the input and output data sizes at each processing stage are the same (no data reduction or increase). In this instance the rate of the slowest link in the chain dictates the overall achievable performance. Reprinted from June 20 COTS Journal
5 1 to 2 Figure If Task 1 is mapped to 1, Task 2 to 2 and Task 3 to 3, the available paths are shown in yellow in Figure 4 for the system. The path from Task 1 to Task 2 is over x8 PCIe Gen2, with an available of 32 Gbit/ ss. The path from Task 2 to Task 3 has access to two links, an aggregate rate of 20 Gbit/s. Therefore the imum path is 20 Gbit/s. In the DDR IB system, the path from Task 2 to Task 3 has access to two IB links, an aggregate rate of 32 Gbit/s. The PCIe link is unchanged, so the imum leg here is 32 Gbit/s. Now, for the system, with paths between 1 and 2 and between 2 and 3, two separate links are available, so 32 Gbit/s is available for both legs. 2 to 3 2 to 3 Shown here is a pipeline dataflow scheme mapped to a system. Backplane Use Case SR DDR 1B : IB: CEN payload 2 switch All-to-all 10 Gbits/s 304 Gbits/s 304 Gbits/s 1.6x 1x CEN payload 2 switch Figure 5 Pipeline 20 Gbits/s 32 Gbits/s 32 Gbits/s 1.6x 1x The table summarizes the system analyses for the, and DDR InfiniBand systems. The DDR InfiniBand system matches the performance of the system for both use cases. The result of all this is that the limiting s for the pipeline use case are 20 Gbit/s for, 32 Gbit/s for DDR IB and 32 Gbit/s for. Other Factors to Consider The push to support open software architectures MOSA, FACE and so on is leading the military embedded processing industry to support middleware packages such as Open Fabric Enterprise Distribution (OFED) and OpenMPI for data movement. Typically OpenMPI is layered over a network stack, and its performance is highly reliant on the efficiency of how the layers map to the underlying fabric. Some implementations rely on rionet, a Linux network driver that presents a TCP/IP interface to. Contrast this with an OpenMPI implementation that maps through OFED to RDMA over or InfiniBand, and it can be seen that the potential exists for a large gap in performance at the application level, with RDMA being much more efficient. Meanwhile, it is sometimes claimed that is more power efficient than the other fabrics. If we total up the power of the bridge and switch components for each -slot system, a truer picture emerges. If you do the math, the power efficiency of and DDR IB is on par, with fairly close. Differences Not Significant Figure 5 summarizes the system analyses for the, and DDR InfiniBand systems. These show that for both use cases, the simplistic analysis presented elsewhere overestimates the performance advantage of over by a factor of two, and that the advantage is completely attributable to the difference in clock rates. The CEN topology has little to no effect in reality. It also shows that the DDR InfiniBand system matches the performance of the system for both use cases. GE Intelligent Platforms Charlottesville, VA. (800) [
Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort
Technology White Paper The CHAMP-AV6 VPX-REDI Digital Signal Processing Card Maximizing Performance with Minimal Porting Effort Introduction The Curtiss-Wright Controls Embedded Computing CHAMP-AV6 is
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationRapidIO.org Update. Mar RapidIO.org 1
RapidIO.org Update rickoco@rapidio.org Mar 2015 2015 RapidIO.org 1 Outline RapidIO Overview & Markets Data Center & HPC Communications Infrastructure Industrial Automation Military & Aerospace RapidIO.org
More informationIntroduction to PCI Express Positioning Information
Introduction to PCI Express Positioning Information Main PCI Express is the latest development in PCI to support adapters and devices. The technology is aimed at multiple market segments, meaning that
More informationRapidIO.org Update.
RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications
More informationSub-microsecond interconnects for processor connectivity The opportunity
1/9 ページ Sub-microsecond interconnects for processor connectivity The opportunity Sam Fuller - May 22, 2013 As Moore s Law has continued to drive the performance and integration of processors ever higher,
More information1 of 5 10/6/2014 3:22 PM
1 of 5 10/6/2014 3:22 PM HOME ARCHIVES RESOURCES ABOUT US ADVERTISE TECH RECON SYSTEM DEVELOPMENT TECHNOLOGY FOCUS SPECIAL FEATURE PRODUCTS PUBLISHER'S NOTEBOOK ALL Search Articles SPECIAL FEATURE KEYWORDS
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationRDMA in Embedded Fabrics
RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationVerification and Validation of X-Sim: A Trace-Based Simulator
http://www.cse.wustl.edu/~jain/cse567-06/ftp/xsim/index.html 1 of 11 Verification and Validation of X-Sim: A Trace-Based Simulator Saurabh Gayen, sg3@wustl.edu Abstract X-Sim is a trace-based simulator
More informationLowering Cost per Bit With 40G ATCA
White Paper Lowering Cost per Bit With 40G ATCA Prepared by Simon Stanley Analyst at Large, Heavy Reading www.heavyreading.com sponsored by www.radisys.com June 2012 Executive Summary Expanding network
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationThe rcuda middleware and applications
The rcuda middleware and applications Will my application work with rcuda? rcuda currently provides binary compatibility with CUDA 5.0, virtualizing the entire Runtime API except for the graphics functions,
More informationEnsemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module
Ensemble 6000 Series OpenVPX HCD6210 Dual QorIQ T4240 Processing Module Next-Generation High Density Processing With I/O in a Single VPX slot OpenVPX The Ensemble 6000 Series OpenVPX HCD6210 High Compute
More informationChoosing the Right COTS Mezzanine Module
Choosing the Right COTS Mezzanine Module Rodger Hosking, Vice President, Pentek One Park Way, Upper Saddle River, New Jersey 07458 Tel: (201) 818-5900 www.pentek.com Open architecture embedded systems
More information01/21/2014 Charles Patrick Collier Embedded Tech Trends
01/21/2014 Charles Patrick Collier mbedded Tech Trends NGI Overarching Goals pace Goals Who is Involved pace (VITA 78) What is in pace? With that in Mind Mapping Interfaces to pace lots xample pace (VITA
More informationImplementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments
White Paper Implementing RapidIO Travis Scheckel and Sandeep Kumar Communications Infrastructure Group, Texas Instruments In today s telecommunications market, slow and proprietary is not the direction
More informationEmbracing Open Technologies in the HPEC Market
Embracing Open Technologies in the HPEC Market Embedded Tech Trends January 2013 Andrew Shieh Director of Engineering CSPI, MultiComputer Division 1 CSPI Company Overview Diversified multinational technology
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationNVM PCIe Networked Flash Storage
NVM PCIe Networked Flash Storage Peter Onufryk Microsemi Corporation Santa Clara, CA 1 PCI Express (PCIe) Mid-range/High-end Specification defined by PCI-SIG Software compatible with PCI and PCI-X Reliable,
More informationJMR ELECTRONICS INC. WHITE PAPER
THE NEED FOR SPEED: USING PCI EXPRESS ATTACHED STORAGE FOREWORD The highest performance, expandable, directly attached storage can be achieved at low cost by moving the server or work station s PCI bus
More informationKey Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits
Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by
More information1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7.
1. Introduction 2. Methods for I/O Operations 3. Buses 4. Liquid Crystal Displays 5. Other Types of Displays 6. Graphics Adapters 7. Optical Discs 1 Introduction Electrical Considerations Data Transfer
More informationCreating an agile infrastructure with Virtualized I/O
etrading & Market Data Agile infrastructure Telecoms Data Center Grid Creating an agile infrastructure with Virtualized I/O Richard Croucher May 2009 Smart Infrastructure Solutions London New York Singapore
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationDeploying 10 Gigabit Ethernet with Cisco Nexus 5000 Series Switches
Deploying 10 Gigabit Ethernet with Cisco Nexus 5000 Series Switches Introduction Today s data centers are being transformed as the most recent server technologies are deployed in them. Multisocket servers
More informationTECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS. Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016
TECHNOLOGIES FOR IMPROVED SCALING ON GPU CLUSTERS Jiri Kraus, Davide Rossetti, Sreeram Potluri, June 23 rd 2016 MULTI GPU PROGRAMMING Node 0 Node 1 Node N-1 MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM MEM
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationUsing Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation
Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation www.atrenne.com sales@atrenne.com 508.588.6110 800.926.8722 Using Mezzanine Card Assemblies: Power Dissipation & Airflow Evaluation
More informationEnsemble 6000 Series OpenVPX Intel Xeon Dual Quad-Core HDS6600 Module
DATASHEET Ensemble 6000 Series OpenVPX Xeon Dual Quad-Core HDS6600 Module Industry-Leading Performance for Rugged Signal Processing 6U OpenVPX -compliant VITA 65/46/48 (VPX-REDI) nodule Two quad-core Xeon
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationMessaging Overview. Introduction. Gen-Z Messaging
Page 1 of 6 Messaging Overview Introduction Gen-Z is a new data access technology that not only enhances memory and data storage solutions, but also provides a framework for both optimized and traditional
More informationWorkshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC
2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste
More informationSingle-Points of Performance
Single-Points of Performance Mellanox Technologies Inc. 29 Stender Way, Santa Clara, CA 9554 Tel: 48-97-34 Fax: 48-97-343 http://www.mellanox.com High-performance computations are rapidly becoming a critical
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationRACEway Interlink Modules
RACE++ Series RACEway Interlink Modules 66-MHz RACE++ Switched Interconnect Adaptive Routing More than 2.1 GB/s Bisection Bandwidth Deterministically Low Latency Low Power, Field-Tested Design Flexible
More informationANSYS Fluent 14 Performance Benchmark and Profiling. October 2012
ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationMVAPICH2 vs. OpenMPI for a Clustering Algorithm
MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore
More informationSUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine
SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine April 2007 Part No 820-1270-11 Revision 1.1, 4/18/07
More informationCreating High Performance Clusters for Embedded Use
Creating High Performance Clusters for Embedded Use 1 The Hype.. The Internet of Things has the capacity to create huge amounts of data Gartner forecasts 35ZB of data from things by 2020 etc Intel Putting
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationiwarp Learnings and Best Practices
iwarp Learnings and Best Practices Author: Michael Fenn, Penn State Date: March 28, 2012 www.openfabrics.org 1 Introduction Last year, the Research Computing and Cyberinfrastructure group at Penn State
More informationOptimizing LS-DYNA Productivity in Cluster Environments
10 th International LS-DYNA Users Conference Computing Technology Optimizing LS-DYNA Productivity in Cluster Environments Gilad Shainer and Swati Kher Mellanox Technologies Abstract Increasing demand for
More informationOctopus: A Multi-core implementation
Octopus: A Multi-core implementation Kalpesh Sheth HPEC 2007, MIT, Lincoln Lab Export of this products is subject to U.S. export controls. Licenses may be required. This material provides up-to-date general
More informationMILC Performance Benchmark and Profiling. April 2013
MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationEmbedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications
Embedded Tech Trends 2014 Rodger H. Hosking Pentek, Inc. VPX for Rugged, Conduction-Cooled Software Radio Virtex-7 Applications System Essentials: Rugged Software Radio Industry Standard Open Architectures
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationOPTIMISING NETWORKED DATA ACQUISITION FOR SMALLER CONFIGURATIONS
OPTIMISING NETWORKED DATA ACQUISITION FOR SMALLER CONFIGURATIONS DAVE BUCKLEY ACRA BUSINESS UNIT, CURTISS-WRIGHT CONTROLS AVIONICS & ELECTRONICS ABSTRACT Network switches are a critical component in any
More informationFROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14
More informationJohn Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems
TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY John Fragalla Presenter s Name Title and Division Sun Microsystems Principle Engineer High Performance
More informationComponents of a MicroTCA System
Micro TCA Overview0 Platform, chassis, backplane, and shelf manager specification, being developed through PICMG Allows AMC modules to plug directly into a backplane Fills the performance/cost gap between
More informationCPU Agnostic Motherboard design with RapidIO Interconnect in Data Center
Agnostic Motherboard design with RapidIO Interconnect in Data Center Devashish Paul Senior Product Manager IDT Chairman RapidIO Trade Association: Marketing Council 2013 RapidIO Trade Association Agenda
More informationM7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle
M7: Next Generation SPARC Hotchips 26 August 12, 2014 Stephen Phillips Senior Director, SPARC Architecture Oracle Safe Harbor Statement The following is intended to outline our general product direction.
More informationFive Ways to Build Flexibility into Industrial Applications with FPGAs
GM/M/A\ANNETTE\2015\06\wp-01154- flexible-industrial.docx Five Ways to Build Flexibility into Industrial Applications with FPGAs by Jason Chiang and Stefano Zammattio, Altera Corporation WP-01154-2.0 White
More informationChelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch
PERFORMANCE BENCHMARKS Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch Chelsio Communications www.chelsio.com sales@chelsio.com +1-408-962-3600 Executive Summary Ethernet provides a reliable
More informationHighly Accurate, Record/ Playback of Digitized Signal Data Serves a Variety of Applications
New Wave Design and Verification Highly Accurate, Record/ Playback of Digitized Signal Data Serves a Variety of Applications Using FPGA-based filtering, precision timestamping and packetinspection, a powerful
More informationComparing Ethernet & Soft RoCE over 1 Gigabit Ethernet
Comparing Ethernet & Soft RoCE over 1 Gigabit Ethernet Gurkirat Kaur, Manoj Kumar 1, Manju Bala 2 1 Department of Computer Science & Engineering, CTIEMT Jalandhar, Punjab, India 2 Department of Electronics
More informationExtending PCI-Express in MicroTCA Platforms. Whitepaper. Joey Maitra of Magma & Tony Romero of Performance Technologies
Extending PCI-Express in MicroTCA Platforms Whitepaper Joey Maitra of Magma & Tony Romero of Performance Technologies Introduction: The introduction of MicroTCA platforms has opened the door for AdvancedMC
More informationSugon TC6600 blade server
Sugon TC6600 blade server The converged-architecture blade server The TC6600 is a new generation, multi-node and high density blade server with shared power, cooling, networking and management infrastructure
More informationThe desire for higher interconnect speeds between
Evaluating high speed industry standard serial interconnects By Harpinder S. Matharu The desire for higher interconnect speeds between chips, boards, and chassis continues to grow in order to satisfy the
More informationLow Latency Server Virtualization
Low Latency Server Virtualization Using RapidIO November 29-30, 2011 Server Design Summit Mohammad Akhter Devashish Paul Integrated Device Technology 2010 Integrated Device Technology, Inc. The Analog
More informationMemory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of
More informationCPMD Performance Benchmark and Profiling. February 2014
CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationNew! New! New! New! New!
New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to
More informationGW2000h w/gw175h/q F1 specifications
Product overview The Gateway GW2000h w/ GW175h/q F1 maximizes computing power and thermal control with up to four hot-pluggable nodes in a space-saving 2U form factor. Offering first-class performance,
More informationLeveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 10 Devices to Achieve Maximum Power Reduction
white paper FPGA Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 1 s to Achieve Maximum Reduction devices leverage the innovative Intel HyperFlex FPGA architecture to achieve power savings
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationSolaris Engineered Systems
Solaris Engineered Systems SPARC SuperCluster Introduction Andy Harrison andy.harrison@oracle.com Engineered Systems, Revenue Product Engineering The following is intended to outline
More informationMeltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies
Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about
More informationHigh Performance Ethernet for Grid & Cluster Applications. Adam Filby Systems Engineer, EMEA
High Performance Ethernet for Grid & Cluster Applications Adam Filby Systems Engineer, EMEA 1 Agenda Drivers & Applications The Technology Ethernet Everywhere Ethernet as a Cluster interconnect Ethernet
More informationAXIe : AdvancedTCA Extensions for Instrumentation and Test. Autotestcon 2016
AXIe : AdvancedTCA Extensions for Instrumentation and Test Autotestcon 2016 Copyright 2016 AXIe Consortium, Inc. * AdvancedTCA is a registered trademark of PICMG. AXIe is a registered trademark of the
More informationFPGA Solutions: Modular Architecture for Peak Performance
FPGA Solutions: Modular Architecture for Peak Performance Real Time & Embedded Computing Conference Houston, TX June 17, 2004 Andy Reddig President & CTO andyr@tekmicro.com Agenda Company Overview FPGA
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationThe VITA Radio Transport as a Framework for Software Definable Radio Architectures
The VITA Radio Transport as a Framework for Software Definable Radio Architectures Robert Normoyle (DRS Signal Solutions, Gaithersburg, Md; Robert.Normoyle@DRS-SS.com); and Paul Mesibov (Pentek, Inc. Upper
More informationVPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its
More informationModule 2 Storage Network Architecture
Module 2 Storage Network Architecture 1. SCSI 2. FC Protocol Stack 3. SAN:FC SAN 4. IP Storage 5. Infiniband and Virtual Interfaces FIBRE CHANNEL SAN 1. First consider the three FC topologies pointto-point,
More informationSurvey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016
Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 VNFaaS (Virtual Network Function as a Service) In our present work, we consider the VNFaaS use-case
More informationFibre Channel over Ethernet and 10GBASE-T: Do More with Less
White Paper Fibre Channel over Ethernet and 10GBASE-T: Do More with Less What You Will Learn Over the past decade, data centers have grown both in capacity and capabilities. Moore s Law which essentially
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationRoCE vs. iwarp Competitive Analysis
WHITE PAPER February 217 RoCE vs. iwarp Competitive Analysis Executive Summary...1 RoCE s Advantages over iwarp...1 Performance and Benchmark Examples...3 Best Performance for Virtualization...5 Summary...6
More informationEnsemble 6000 Series OpenVPX Intel 4 th Generation Core i7 module LDS6525-CX
DATASHEET Ensemble 6000 Series OpenVPX Intel 4 th Generation Core i7 module LDS6525-CX 4 th Gen Intel Quad-Core i7 LDS module with CPU on-die GPU and mezzanine sites 4 th Gen Intel Quad-Core i7 processor
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationLinköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing
Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.
More informationFlex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide
Flex System IB6132 2-port FDR InfiniBand Adapter Lenovo Press Product Guide The Flex System IB6132 2-port FDR InfiniBand Adapter delivers low latency and high bandwidth for performance-driven server clustering
More informationIntel Xeon Sandy Bridge Server-Class Processor DATASHEET
DATASHEET Ensemble 6000 Series OpenVPX, Intel 3 rd Generation Xeon 10-core, Ethernet/Infiniband High Density Server-Class HDS6602 Processing Module Most Powerful, Rugged, Single Slot Intel Server-Class
More informationParallel Computing: Parallel Architectures Jin, Hai
Parallel Computing: Parallel Architectures Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Peripherals Computer Central Processing Unit Main Memory Computer
More informationGPM0001 E9171 GPU-based Processor Module
GPM0001 E9171 GPU-based Processor Module DO-254 Certifiable 3U VPX Graphics/Compute Module IP Features and Benefits Part of the COTS-D family of safety certifiable modules A compact GPU Processing Module
More informationVersion PEX 8516
Version 1.4 2006 PEX 8516 Not recommended for new designs please use PEX8518 for new designs Version 1.4 2006 Features PEX 8516 General Features o 16-lane PCI Express switch - Integrated SerDes o Up to
More informationIntroduction to High-Speed InfiniBand Interconnect
Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output
More information