Fermi Cluster for Real-Time Hyperspectral Scene Generation
|
|
- Loreen Pierce
- 5 years ago
- Views:
Transcription
1 Fermi Cluster for Real-Time Hyperspectral Scene Generation Gary McMillian, Ph.D. Crossfield Technology LLC 9390 Research Blvd, Suite I200 Austin, TX (512) x151 AF SBIR Program, Donald Snyder, III Program Manager Funding provided by Frank Carlen, Multi-Spectral Test
2 System Architecture & Approach Scenes generated by heterogeneous processors, then transported over In5iniBand to the projector(s) using RDMA protocol for high throughput and low latency Network interfaces aggregate data from multiple heterogeneous processors in high- speed frame buffers Contents of frame buffers output to projector through FPGA Mezzanine Card (FMC) interface IEEE 1588 Precision Time Protocol (PTP) provides global time synchronization Heterogeneous processors and projector network interfaces scale independently 7/20/11 Crossfield Technology LLC 2
3 Scalable System Architecture Processor Network CPU/GPU Interface 7/20/11 LVDS Projector HWIL Fiber InfiniBand Switch Processor Nodes DVI Network Interface Adapters Crossfield Technology LLC 3
4 HWIL Simulation System QuickPath Interconnect (QPI) ~100 Gbps PCI Express x8 ~32 Gbps (x16 ~64 Gbps) DDR3 SDRAM ~85 Gbps/ch x 3 ch GDDR5 SDRAM ~192 Gbps/ch x 6 ch QDR InfiniBand ~32 Gbps VITA 57.1 / FMC ~100 Gbps SERDES Gbps LVDS I/O Projector / HWIL User-definable PHY Frame Synch/Request FMC SSD DDR3 SDRAM CPU CPU DDR3 SDRAM FPGA DDR3 SDRAM QPI QPI PCIe x8 GDDR5 SDRAM GPU PCIe Bridge PCIe Bridge GPU GDDR5 SDRAM CPU DDR3 SDRAM PCIe x8 PCIe x8 GDDR5 SDRAM GPU Network Adapter Network Adapter 1U-4U Heterogeneous Processor Network Adapter 1U Crossfield Network Interface IEEE 1588 PTP Server + Ethernet InfiniBand Switch ( ports) 7/20/11 Crossfield Technology LLC 4
5 REAL-TIME HIGH PERFORMANCE COMPUTER (HPC) 7/20/11 Crossfield Technology LLC 5
6 Real-Time HPC Requirements Deterministic & Synchronous Synthesized images complete & ready at HWIL frame rate High Floating-Point Performance Implement physics-based algorithms High Bandwidth Inter-processor communications for data exchange Stream high-resolution images to projector at high frame rates High Memory Capacity & Performance Processor memory code, model parameters, data Non-volatile storage code, model parameters, data, logging 7/20/11 Crossfield Technology LLC 6
7 Intel Xeon Processor Roadmap Westmere Microarchitecture 32 nm process, 6 Cores 40 lanes PCI Express Gen channels DDR Sandy Bridge Microarchitecture 32 nm process, 4-8 Cores 40 lanes PCI Express Gen channels DDR /20/11 Crossfield Technology LLC 7
8 Nvidia CUDA GPU Roadmap 21 SEP 2010 Kepler To be released sometime in 2011, 28 nm process. Estimated performance of 4-6 DP GFLOPS/W Maxwell To be released sometime in 2013, 22 nm process. Estimated performance of DP GFLOPS/W 7/20/11 Crossfield Technology LLC 8
9 Nvidia Tesla (Fermi Architecture) CUDA Programming Environment C/C++, Fortran, OpenCL, Java, Python or DirectX Compute GIGATHREAD Engine 515 GFLOP Double Precision C2050/C GFLOP Single Precision PARALLEL DATACACHE Technology 3-6 GB GDDR5 memory 384-bit bus ECC option GPUDirect with InfiniBand M2050/M2070 PCI Express 2.0 (16 lanes) Two DMA engines for bi-directional data transfer 7/20/11 Crossfield Technology LLC 9
10 Nvidia Tesla Comparison Peak double precision floating point performance Peak single precision floating point performance Tesla C2070 Tesla M2070 Tesla M GFLOPS 515 GFLOPS 665 GFLOPS 1030 GFLOPS 1030 GFLOPS 1331 GFLOPS CUDA cores Memory size (GDDR5) Memory bandwidth (ECC off) Total Dissipated Power (TDP) 6 GB 6 GB 6 GB 144 GB/s 150 GB/s 177 GB/s 247 W 225 W 250 W Retail price $2300 ~$2300 ~$3500 7/20/11 Crossfield Technology LLC 10
11 InfiniBand Roadmap SDR - Single Data Rate DDR - Double Data Rate QDR - Quad Data Rate FDR - Fourteen Data Rate EDR - Enhanced Data Rate HDR - High Data Rate NDR - Next Data Rat 7/20/11 Crossfield Technology LLC 11
12 Mellanox ConnectX-2 Network Adapters Nvidia GPUDirect InfiniBand Adapter and Nvidia GPU share CPU memory region Open Fabrics Enterprise Distribution (OFED) Software Bandwidth 10G Ethernet 10/20/40G InfiniBand Protocol Support Remote Direct Memory Access (RDMA) OpenMPI, OSU MVAPICH, HPMPI, Intel MPI, MS MPI, Scali MPI TCP/UDP, IPoIB, SDP, RDS SRP, iser, NFS RDMA, FCoIB, FCoE PCIe 2.0 (8-lanes) Performance 1 µs Ping latency 50M MPI messages/s 7/20/11 Crossfield Technology LLC 12
13 Mellanox IS5200 InfiniBand Switch Non-blocking, full bisectional bandwidth ns latency Up to 216 QSFP ports Tb/s aggregate throughput 9U cabinet 6 spine modules 12 leaf modules 1 kw 7/20/11 Crossfield Technology LLC 13
14 Remote Direct Memory Access (RDMA) Remote Direct Memory Access enables data to be transferred from one processor s memory to another processor s memory across a network, without significantly involving either operating system RDMA supports zero-copy data transfers by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and data buffers in the operating system kernel RDMA defines READ, WRITE and SEND/RECEIVE RDMA adapters support thousands of concurrent transactions using work queues 7/20/11 Crossfield Technology LLC 14
15 OpenFabrics Alliance (OFA) Open Source Application Level User APIs Diag Tools Open SM User Level MAD API InfiniBand OpenFabrics User Level Verbs / API iwarp R-NIC User Space IP Based App Access Sockets Based Access SDP Lib UDAPL Various MPIs Block Storage Access Clustered DB Access Access to File Systems SA MAD SMA PMA IPoIB Subnet Administrator Management Datagram Subnet Manager Agent Performance Manager Agent IP over InfiniBand Upper Layer Protocol Mid-Layer Kernel bypass SA Client Kernel Space MAD SMA IPoIB SDP SRP iser RDS Connection Manager Connection Manager Abstraction (CMA) Connection Manager NFS-RDMA RPC Cluster File Sys InfiniBand OpenFabrics Kernel Level Verbs / API iwarp R-NIC Kernel bypass SDP SRP iser RDS UDAPL HCA Sockets Direct Protocol SCSI RDMA Protocol (Initiator) iscsi RDMA Protocol (Initiator) Reliable Datagram Service User Direct Access Programming Lib Host Channel Adapter Provider Hardware Hardware Specific Driver InfiniBand HCA Hardware Specific Driver iwarp R-NIC R-NIC Key RDMA NIC Common InfiniBand iwarp Apps & Access Methods for using OF Stack 7/20/11 Crossfield Technology LLC 15
16 GPU Server Options 1U server Dual Xeon 5600 processors & 5520 chipsets Three 16-lane + one 8-lane PCIe slots Supports 1-3 M IB HCA 2U server Dual Xeon 5600 processors & 5520 chipsets Four 16-lane + two 8-lane PCIe slots (PLX 8647 switch) Supports 1-4 M IB HCA 4U server Dual Xeon 5600 processors & 5520 chipsets Eight 16-lane PCIe slots (4 PLX 8647 switches) Supports 4-7 C IB HCA 7/20/11 Crossfield Technology LLC 16
17 HPC System Configuration 4U Servers (64 + 1) Dual 6-core, 2.66 GHz Intel Xeon 5650 (Westmere) CPUs Dual Intel 5520 (Tylersburg-36D) IOH with 6.4 GT/s QPI Four 16-lane PCI Express Gen 2 slots Six 8 GB DDR DIMMs (48 GB) Four Nvidia Tesla C2070 (Fermi) GPUs One Mellanox 40G InfiniBand Host Channel Adapter One 300 GB, 10K RPM disk drive Mellanox 40G InfiniBand Switch (216 ports max) Symmetricom IEEE 1588 PTP Master Clock APC Smart-UPS RT 6000VA (18) 76 kw 42U Racks (9) *65 nodes x 1.4 kw/node = 91 kw 7/20/11 Crossfield Technology LLC 17
18 Advanced HPC System Configuration 2U Servers (64 + 1) Dual 6-core, 2.66 GHz Intel Xeon 5650 (Westmere) CPUs Dual Intel 5520 (Tylersburg-36D) IOH with 6.4 GT/s QPI Four 16-lane + two 8-lane PCI Express Gen 2 slots (with switch) Six 8 GB DDR DIMMs (48 GB) Three Nvidia Tesla M2090 (Fermi) GPUs Two Mellanox 40G InfiniBand Host Channel Adapters One 250 GB SSD (solid state disk) Mellanox 40G InfiniBand Switch (216 ports max) Symmetricom IEEE 1588 PTP Master Clock APC Symmetra PX SY100K100F UPS kw 42U Racks (4+1) 7/20/11 Crossfield Technology LLC 18
19 Future HPC System Configuration 2U Servers (64 + 1) Dual 8-core, 2.3 GHz Intel Xeon E (Sandy Bridge) CPUs Four 16-lane + two 8-lane PCI Express Gen 3 slots (with switch) Eight 8 GB DDR DIMMs (64 GB) Three Nvidia Tesla M2090 (Fermi) GPUs Two Mellanox 56G InfiniBand Host Channel Adapters One 250 GB SSD (solid state disk) Mellanox 56G InfiniBand Switch (648 ports max) Symmetricom IEEE 1588 PTP Master Clock APC Symmetra PX SY100K100F UPS kw 42U Racks (4+1) 7/20/11 Crossfield Technology LLC 19
20 IEEE 1588 Precision Time Protocol IEEE Precision Time Protocol (PTP) Version 2 overcomes network and application latency and jitter through hardware time stamping at the physical layer of the network. IEEE provides time transfer accuracy in the sub ns range, a significant improvement in time synchronization accuracy over Network Time Protocol (NTP). The Symmetricom XLi Grandmaster is IEEE PTP V2 compliant and time stamps PTP packets with a time stamp accuracy of 50 ns to UTC. Measured synchronization accuracy at a PTP client has been shown to be as good as a 17 ns offset from the XLi Grandmaster. Operating at 100BaseT line speed with deep time stamp packet buffers, the XLi Grandmaster can support thousands of 1588 clients. 7/20/11 Crossfield Technology LLC 20
21 Uninterruptable Power Supply (UPS) APC Symmetra PX 100kW Scalable to 100kW/100kVA 208V 3PH 332A Service 7/20/11 Crossfield Technology LLC 21
22 APC Symmetra PX Performance 7/20/11 Crossfield Technology LLC 22
23 HPC Performance Node System Cores CPU/GPU 12/ /98304 CPU SP FP Performance 128 GFLOP 8 TFLOP CPU DP FP Performance 64 GFLOP 4 TFLOP GPU SP FP Performance 3990 GFLOP 255 TFLOP GPU DP FP Performance 1995 GFLOP 128 TFLOP Main Memory Size 48 GB 3 TB Main Memory BW 64 GB/s 4 TB/s Disk Size 250 GB 16 TB Disk IOPS (4 KB) 20K 1.28M Disk R/W BW 500/315 MB/s 32/20 GB/s Network BW 50 Gb/s 3.2 Tb/s Power 1.5 kw 100 kw 7/20/11 Crossfield Technology LLC 23
24 HPC Procurement Schedule Breadboard Performance Evaluation 15 JUL Finalize HPC Configuration 15 JUL # Fermi Processors (4 -> 3) # IB Adapters (1 -> 2) UPS (100 kw), Server (4U -> 2U), SSD Request Final Vendor Quotes 1 AUG HPC Vendor Selection Issue HPC System Purchase Order OCT 31 HPC System Integration & Test by Vendor 6-12 week delivery ARO Installation DEC 31 Prepare electrical supply for UPS 7/20/11 Crossfield Technology LLC 24
25 REAL-TIME LINUX 7/20/11 Crossfield Technology LLC 25
26 Real-Time Operating System (RTOS) Requirements No dropped frames during simulation run Support Nvidia s CUDA Support InfiniBand Adapter with GPUDirect Support Precision Time Protocol (PTP) IEEE 1588 Candidate RTOS Concurrent Computer RedHawk RedHat MRG (Messaging, Real-Time, Grid) 7/20/11 Crossfield Technology LLC 26
27 Interrupt Dispatch Latency* *Ravi Malhotra, Real-Time Performance on Linux-based Systems, 2011 Freescale Technology Forum 7/20/11 Crossfield Technology LLC 27
28 Real-Time Support on Linux* Traditionally, Linux is not a real-time operating system Designed for server throughput performance rather than embedded systems latency Scheduling latencies can be unbound Big kernel lock and other mechanisms (softirq) typically end up blocking real-time critical tasks Processes cannot be pre-empted while executing system calls *Ravi Malhotra, Real-Time Performance on Linux-based Systems, 2011 Freescale Technology Forum 7/20/11 Crossfield Technology LLC 28
29 Sources of Latency & How RT Patch Helps* *Ravi Malhotra, Real-Time Performance on Linux-based Systems, 2011 Freescale Technology Forum 7/20/11 Crossfield Technology LLC 29
30 HPC PERFORMANCE MODEL 7/20/11 Crossfield Technology LLC 30
31 Hyperformix Workbench Performance Model 7/20/11 Crossfield Technology LLC 31
32 Workbench Model Steps The application consists of 9 steps that comprise the generation and transfer of a frame: 1. Projector requests frame (provides state data) 2. CPU setups Frame Generation Process 3. CPU writes task data to CPU Memory (DDR3 SDRAM) 4. CPU tasks the GPU to synthesize the Frame 5. GPU reads the task data from CPU memory 6. GPU synthesizes the Frame 7. GPU transfers the frame data to CPU memory 8. CPU tasks the InfiniBand Network Adapters to transfer the frame to Crossfield Network Interface via the InfiniBand Switch 9. Network Adapters transfer the frame to FPGA memory using RDMA Protocol 7/20/11 Crossfield Technology LLC 32
33 Hyperformix Workbench Performance Model 7/20/11 Crossfield Technology LLC 33
34 Workbench Model Results Application Steps Response (µs) Application.Step_1_Frame_Request_from_Projector.response Application.Step_2_and_3_Setup_Process_and_write_data_to_memory.response Application.Step_4_CPU_tasks_GPU.response Application.Step_5_GPU_reads_data_from_CPU_Memory.response Application.Step_6_GPU_synthesizes_Frame_first_transfer.response 1000 Application.Step_7_GPU_xfers_Frame_to_CPU_memory.response Application.Step_8_CPU_tasks_Network_Adapter_to_transfer_Frame_to_NI.response Application.Step_9_Network_Adapter_xfer_frame_to_NI_FPGA_Memory.response 2259 Application.Main_RT_App.All_Steps_transfer_RT_ /20/11 Crossfield Technology LLC 34
35 PROJECTOR INTERFACE 7/20/11 Crossfield Technology LLC 35
36 Projector Interfaces FPGA Mezzanine Cards (FMC) 1. Two Dual DVI 2. Parallel Fiber Optic Ports (8-10) 3. Digital Micromirror Device (DMD) Interface All modules provide 2 User Definable I/Os, e.g. HWIL Synchronization Signal Output Next Frame 7/20/11 Crossfield Technology LLC 36
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationInfiniband and RDMA Technology. Doug Ledford
Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic
More informationOFED Storage Protocols
OFED Storage Protocols R. Pearson System Fabric Works, Inc. Agenda Why OFED Storage Introduction to OFED Storage Protocols OFED Storage Protocol Update 2 Why OFED Storage 3 Goals of I/O Consolidation Cluster
More informationInformatix Solutions INFINIBAND OVERVIEW. - Informatix Solutions, Page 1 Version 1.0
INFINIBAND OVERVIEW -, 2010 Page 1 Version 1.0 Why InfiniBand? Open and comprehensive standard with broad vendor support Standard defined by the InfiniBand Trade Association (Sun was a founder member,
More informationIntroduction to High-Speed InfiniBand Interconnect
Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output
More informationCERN openlab Summer 2006: Networking Overview
CERN openlab Summer 2006: Networking Overview Martin Swany, Ph.D. Assistant Professor, Computer and Information Sciences, U. Delaware, USA Visiting Helsinki Institute of Physics (HIP) at CERN swany@cis.udel.edu,
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationPERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency
PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency Mellanox continues its leadership providing InfiniBand Host Channel
More informationVPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its
More informationVPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
VPI / InfiniBand Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox enables the highest data center performance with its
More informationPerformance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability
Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability Mellanox InfiniBand Host Channel Adapters (HCA) enable the highest data center
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationMultifunction Networking Adapters
Ethernet s Extreme Makeover: Multifunction Networking Adapters Chuck Hudson Manager, ProLiant Networking Technology Hewlett-Packard 2004 Hewlett-Packard Development Company, L.P. The information contained
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationHigh Performance Computing
High Performance Computing Dror Goldenberg, HPCAC Switzerland Conference March 2015 End-to-End Interconnect Solutions for All Platforms Highest Performance and Scalability for X86, Power, GPU, ARM and
More informationIntroduction to Infiniband
Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationANSYS Fluent 14 Performance Benchmark and Profiling. October 2012
ANSYS Fluent 14 Performance Benchmark and Profiling October 2012 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information
More informationHA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs
HA-PACS/TCA: Tightly Coupled Accelerators for Low-Latency Communication between GPUs Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba,
More informationMegaGauss (MGs) Cluster Design Overview
MegaGauss (MGs) Cluster Design Overview NVIDIA Tesla (Fermi) S2070 Modules Based Solution Version 6 (Apr 27, 2010) Alexander S. Zaytsev p. 1 of 15: "Title" Front view: planar
More informationTightly Coupled Accelerators Architecture
Tightly Coupled Accelerators Architecture Yuetsu Kodama Division of High Performance Computing Systems Center for Computational Sciences University of Tsukuba, Japan 1 What is Tightly Coupled Accelerators
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationOpenFabrics Interface WG A brief introduction. Paul Grun co chair OFI WG Cray, Inc.
OpenFabrics Interface WG A brief introduction Paul Grun co chair OFI WG Cray, Inc. OFI WG a brief overview and status report 1. Keep everybody on the same page, and 2. An example of a possible model for
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More information2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide
2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide The 2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter is a dual port InfiniBand Host
More informationHPC Hardware Overview
HPC Hardware Overview John Lockman III April 19, 2013 Texas Advanced Computing Center The University of Texas at Austin Outline Lonestar Dell blade-based system InfiniBand ( QDR) Intel Processors Longhorn
More informationMemory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationHPC Customer Requirements for OpenFabrics Software
HPC Customer Requirements for OpenFabrics Software Matt Leininger, Ph.D. Sandia National Laboratories Scalable Computing R&D Livermore, CA 16 November 2006 I'll focus on software requirements (well maybe)
More informationFROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationQuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview
Overview HP supports 40Gbps (QDR) and 20Gbps (DDR) InfiniBand products that include mezzanine Host Channel Adapters (HCA) for server blades, switch blades for c-class enclosures, and rack switches and
More informationPART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System
INSTITUTE FOR PLASMA RESEARCH (An Autonomous Institute of Department of Atomic Energy, Government of India) Near Indira Bridge; Bhat; Gandhinagar-382428; India PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE
More informationLow latency, high bandwidth communication. Infiniband and RDMA programming. Bandwidth vs latency. Knut Omang Ifi/Oracle 2 Nov, 2015
Low latency, high bandwidth communication. Infiniband and RDMA programming Knut Omang Ifi/Oracle 2 Nov, 2015 1 Bandwidth vs latency There is an old network saying: Bandwidth problems can be cured with
More informationProximity-based Computing
Proximity-based Computing David Cohen, Goldman Sachs What is Proximity Computing 1. A business group uses rsync to replicate data from the intranet into a set of compute farms in advance of the execution
More informationImplementing Storage in Intel Omni-Path Architecture Fabrics
white paper Implementing in Intel Omni-Path Architecture Fabrics Rev 2 A rich ecosystem of storage solutions supports Intel Omni- Path Executive Overview The Intel Omni-Path Architecture (Intel OPA) is
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More information10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G
10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures
More informationQLogic in HPC Vendor Update IDC HPC User Forum April 16, 2008 Jeff Broughton Sr. Director Engineering Host Solutions Group
QLogic in HPC Vendor Update IDC HPC User Forum April 16, 2008 Jeff Broughton Sr. Director Engineering Host Solutions Group 1 Networking for Storage and HPC Leading supplier of Fibre Channel Leading supplier
More informationCPMD Performance Benchmark and Profiling. February 2014
CPMD Performance Benchmark and Profiling February 2014 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationInterconnection Network for Tightly Coupled Accelerators Architecture
Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What
More informationChecklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics
Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics Lloyd Dickman, CTO InfiniBand Products Host Solutions Group QLogic Corporation November 13, 2007 @ SC07, Exhibitor Forum
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationWorkshop on High Performance Computing (HPC) Architecture and Applications in the ICTP October High Speed Network for HPC
2494-6 Workshop on High Performance Computing (HPC) Architecture and Applications in the ICTP 14-25 October 2013 High Speed Network for HPC Moreno Baricevic & Stefano Cozzini CNR-IOM DEMOCRITOS Trieste
More informationA Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng
More informationDB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011
DB2 purescale: High Performance with High-Speed Fabrics Author: Steve Rees Date: April 5, 2011 www.openfabrics.org IBM 2011 Copyright 1 Agenda Quick DB2 purescale recap DB2 purescale comes to Linux DB2
More informationComputing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics
Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago, A. Kopmann, U. Stevanovic, M. Vogelgesang Hardware, Software, and
More informationPhilippe Thierry Sr Staff Engineer Intel Corp.
HPC@Intel Philippe Thierry Sr Staff Engineer Intel Corp. IBM, April 8, 2009 1 Agenda CPU update: roadmap, micro-μ and performance Solid State Disk Impact What s next Q & A Tick Tock Model Perenity market
More informationLatest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand
Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationn N c CIni.o ewsrg.au
@NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU
More informationOPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA
OPEN MPI WITH RDMA SUPPORT AND CUDA Rolf vandevaart, NVIDIA OVERVIEW What is CUDA-aware History of CUDA-aware support in Open MPI GPU Direct RDMA support Tuning parameters Application example Future work
More informationEvolving HPC Solutions Using Open Source Software & Industry-Standard Hardware
CLUSTER TO CLOUD Evolving HPC Solutions Using Open Source Software & Industry-Standard Hardware Carl Trieloff cctrieloff@redhat.com Red Hat, Technical Director Lee Fisher lee.fisher@hp.com Hewlett-Packard,
More informationMILC Performance Benchmark and Profiling. April 2013
MILC Performance Benchmark and Profiling April 2013 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the supporting
More informationInfiniBand Networked Flash Storage
InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB
More informationGPUs and Emerging Architectures
GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs GPU architecture Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University GPU Graphical Processing
More informationGPU-centric communication for improved efficiency
GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop
More informationSolutions for Scalable HPC
Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End
More informationPaving the Road to Exascale
Paving the Road to Exascale Gilad Shainer August 2015, MVAPICH User Group (MUG) Meeting The Ever Growing Demand for Performance Performance Terascale Petascale Exascale 1 st Roadrunner 2000 2005 2010 2015
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationOpenFabrics Alliance Interoperability Logo Group (OFILG) Dec 2011 Logo Event Report
OpenFabrics Alliance Interoperability Logo Group (OFILG) Dec 2011 Logo Event Report UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 - +1-603-862-0090 OpenFabrics Interoperability Logo Group (OFILG)
More informationThe NE010 iwarp Adapter
The NE010 iwarp Adapter Gary Montry Senior Scientist +1-512-493-3241 GMontry@NetEffect.com Today s Data Center Users Applications networking adapter LAN Ethernet NAS block storage clustering adapter adapter
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationCreating an agile infrastructure with Virtualized I/O
etrading & Market Data Agile infrastructure Telecoms Data Center Grid Creating an agile infrastructure with Virtualized I/O Richard Croucher May 2009 Smart Infrastructure Solutions London New York Singapore
More informationKey Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits
Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationBuilding the Most Efficient Machine Learning System
Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationLustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE
Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE Hitoshi Sato *1, Shuichi Ihara *2, Satoshi Matsuoka *1 *1 Tokyo Institute
More informationSupport for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth
Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationA first look at 100 Gbps LAN technologies, with an emphasis on future DAQ applications.
21st International Conference on Computing in High Energy and Nuclear Physics (CHEP21) IOP Publishing Journal of Physics: Conference Series 664 (21) 23 doi:1.188/1742-696/664//23 A first look at 1 Gbps
More informationPerformance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster
Performance Analysis of Memory Transfers and GEMM Subroutines on NVIDIA TESLA GPU Cluster Veerendra Allada, Troy Benjegerdes Electrical and Computer Engineering, Ames Laboratory Iowa State University &
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationOceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.
OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in
More informationCan Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationIn-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017
In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationExploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR
Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication
More informationE4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè. ARM64 and GPGPU
E4-ARKA: ARM64+GPU+IB is Now Here Piero Altoè ARM64 and GPGPU 1 E4 Computer Engineering Company E4 Computer Engineering S.p.A. specializes in the manufacturing of high performance IT systems of medium
More informationNVIDIA GPUDirect Technology. NVIDIA Corporation 2011
NVIDIA GPUDirect Technology NVIDIA GPUDirect : Eliminating CPU Overhead Accelerated Communication with Network and Storage Devices Peer-to-Peer Communication Between GPUs Direct access to CUDA memory for
More informationRDMA in Embedded Fabrics
RDMA in Embedded Fabrics Ken Cain, kcain@mc.com Mercury Computer Systems 06 April 2011 www.openfabrics.org 2011 Mercury Computer Systems, Inc. www.mc.com Uncontrolled for Export Purposes 1 Outline Embedded
More informationIntel Workstation Technology
Intel Workstation Technology Turning Imagination Into Reality November, 2008 1 Step up your Game Real Workstations Unleash your Potential 2 Yesterday s Super Computer Today s Workstation = = #1 Super Computer
More informationN V M e o v e r F a b r i c s -
N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server
More informationOCTOPUS Performance Benchmark and Profiling. June 2015
OCTOPUS Performance Benchmark and Profiling June 2015 2 Note The following research was performed under the HPC Advisory Council activities Special thanks for: HP, Mellanox For more information on the
More informationPerformance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet. Swamy N. Kandadai and Xinghong He and
Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet Swamy N. Kandadai and Xinghong He swamy@us.ibm.com and xinghong@us.ibm.com ABSTRACT: We compare the performance of several applications
More informationFast packet processing in the cloud. Dániel Géhberger Ericsson Research
Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration
More informationMemcached Design on High Performance RDMA Capable Interconnects
Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi- ur- Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan
More informationOptimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory
More informationSwitchX Virtual Protocol Interconnect (VPI) Switch Architecture
SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect
More informationEvaluating the Impact of RDMA on Storage I/O over InfiniBand
Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline
More informationiscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc
iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc Introduction iscsi is compatible with 15 years of deployment on all OSes and preserves software investment iser and iscsi are layered on top
More informationOpenFabrics Alliance Interoperability Logo Group (OFILG) May 2012 Logo Event Report
OpenFabrics Alliance Interoperability Logo Group (OFILG) May 2012 Logo Event Report UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 - +1-603-862-0090 OpenFabrics Interoperability Logo Group (OFILG)
More informationNVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal
Lugano April 2018 NVMe Takes It All, SCSI Has To Fall freely adapted from ABBA Brave New Storage World Alexander Ruebensaal 1 Design, Implementation, Support & Operating of optimized IT Infrastructures
More information