Evolving HPC Solutions Using Open Source Software & Industry-Standard Hardware

Similar documents
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE

Next-Generation AMQP Messaging Performance, Architectures, and Ecosystems with Red Hat Enterprise MRG. Bryan Che MRG Product Manager Red Hat, Inc.

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Red Hat Enterprise Linux MRG Red Hat Network Satellite Red Hat Enterprise Virtualization JBoss Cloud

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

QuickSpecs. HP InfiniBand Options for HP BladeSystems c-class. Overview

Best Practices for Setting BIOS Parameters for Performance

Introduction to Infiniband

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

DB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

Infiniband and RDMA Technology. Doug Ledford

CESM (Community Earth System Model) Performance Benchmark and Profiling. August 2011

PERFORMANCE ACCELERATED Mellanox InfiniBand Adapters Provide Advanced Levels of Data Center IT Performance, Productivity and Efficiency

MRG - AMQP trading system in a rack. Carl Trieloff Senior Consulting Software Engineer/ Director MRG Red Hat, Inc.

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Application Acceleration Beyond Flash Storage

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Cisco HyperFlex HX220c M4 Node

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

TIBCO, HP and Mellanox High Performance Extreme Low Latency Messaging

MM5 Modeling System Performance Research and Profiling. March 2009

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Creating an agile infrastructure with Virtualized I/O

SNAP Performance Benchmark and Profiling. April 2014

ICON Performance Benchmark and Profiling. March 2012

GROMACS Performance Benchmark and Profiling. September 2012

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

iscsi Technology: A Convergence of Networking and Storage

Grid for Financial Services

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

GW2000h w/gw175h/q F1 specifications

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

CPMD Performance Benchmark and Profiling. February 2014

OFED Storage Protocols

Sharing High-Performance Devices Across Multiple Virtual Machines

CP2K Performance Benchmark and Profiling. April 2011

GROMACS Performance Benchmark and Profiling. August 2011

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

SCHOOL OF PHYSICAL, CHEMICAL AND APPLIED SCIENCES

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

HP ProLiant blade planning and deployment

Configuration Maximums

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview

CP2K Performance Benchmark and Profiling. April 2011

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

ABySS Performance Benchmark and Profiling. May 2010

Emerging Technologies for HPC Storage

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

AMBER 11 Performance Benchmark and Profiling. July 2011

Voltaire. Fast I/O for XEN using RDMA Technologies. The Grid Interconnect Company. April 2005 Yaron Haviv, Voltaire, CTO

2008 International ANSYS Conference

SNIA Developers Conference - Growth of the iscsi RDMA (iser) Ecosystem

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Z RESEARCH, Inc. Commoditizing Supercomputing and Superstorage. Massive Distributed Storage over InfiniBand RDMA

Chapter 3 Virtualization Model for Cloud Computing Environment

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

The Software Driven Datacenter

10 Steps to Virtualization

Oracle Enterprise Architecture. Software. Hardware. Complete. Oracle Exalogic.

The following InfiniBand products based on Mellanox technology are available for the HP BladeSystem c-class from HP:

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

HP s Performance Oriented Datacenter

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Cisco UCS C24 M3 Server

MILC Performance Benchmark and Profiling. April 2013

Flex System IB port FDR InfiniBand Adapter Lenovo Press Product Guide

Oracle Exadata: Strategy and Roadmap

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

Red Hat enterprise virtualization 3.0

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

HP BladeSystem c-class Ethernet network adaptors

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

The HP Blade Workstation Solution A new paradigm in workstation computing featuring the HP ProLiant xw460c Blade Workstation

HPC Performance in the Cloud: Status and Future Prospects

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

Extending InfiniBand Globally

10 Gbit/s Challenge inside the Openlab framework

Solaris Engineered Systems

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Himeno Performance Benchmark and Profiling. December 2010

University at Buffalo Center for Computational Research

HP0-S15. Planning and Designing ProLiant Solutions for the Enterprise. Download Full Version :

Feature Comparison Summary

ARISTA: Improving Application Performance While Reducing Complexity

HP BladeSystem c-class Ethernet network adapters

AcuSolve Performance Benchmark and Profiling. October 2011

InfiniBand Networked Flash Storage

HP InfiniBand Options for HP ProLiant and Integrity Servers Overview

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

MidoNet Scalability Report

Transcription:

CLUSTER TO CLOUD Evolving HPC Solutions Using Open Source Software & Industry-Standard Hardware Carl Trieloff cctrieloff@redhat.com Red Hat, Technical Director Lee Fisher lee.fisher@hp.com Hewlett-Packard, WW FSI-HPC Business Development 1

Financial compute to cloud example Scale up Grid Internal grid Messaging Scale out Scheduler Messaging Trader Latency External public cloud Trade execution 2 Internal private cloud

What are some of the requirements? Cloud computing is a hot topic, but many people have important questions and challenges they need addressed before they can adopt cloud: How do I build an internal cloud? How do I avoid lock-in to a single cloud? 3 How do I deal with homogeneous & non homogeneous hardware requirements? How do I mix, match, and blend different cloud resources including internal and external clouds? How do I manage a variety of applications and groups with different SLAs, priorities, and resource requirements across clouds? How do I manage software licensing/ hardware limits? How do I abstract resource management, accounting and permissions?

Red Hat Enterprise MRG Integrated platform for high performance distributed computing High speed, interoperable, open standard Messaging Deterministic, low-latency Realtime kernel High performance & throughput computing Grid scheduler for distributed workloads and Cloud computing 4

AMQP Messaging on 8-node HP Nehalem Infiniband 40Gps > 11 M mes/s 7000000 3.1 1 3.0 8 3.1 3 6000000 2.5 Messages/Sec 5000000 2 4000000 1.5 3000000 1 2 000000 0.5 1000000 0 0 4 Broker 2 Broker Number of Brokers per Server 5 1 Broker HP-G6 Nahlem HP-G5 Harperton Improvement ratio

AMQP, HP Performance, scale up. Single HP Nehalem BL460c 40G Infiniband AMQP Perftest 12M 12000000 10M Messages/Sec 10000000 8M 8000000 8 bytes 64 Bytes 256 Bytes 1024 Byt es 6M 6000000 4M 4000000 2M 2000000 00 8 Broker 4 Broker 2 Broker 1 Broker Number of Brokers on the Server two BL460c G6 with two Intel(R) Xeon(R) X5570 CPUs per blade (Nehalem 2.93 GHz, 8MB L3 cache, 95W) Memory 24GB(6x4GB), Memory Type DDR3-1333, HT, Turbo 2/2/3/3) Infiniband 4X QDR IB Dual-port Mezzanine HCAs (1 port connected) Infiniband Switch BLc 4X QDR IB Switch 6

KVM Performance Only 5% of bare metal AMQP Messaging Intel Nahalem 2 10Gbit Vt-D > 1 M mes/s RHEL 5.4 KVM AMQP 2-Guest 12 00000 12M 900 1046081 1000000 10M 800 1023869 902689 804045 800000 Messages / Sec 700 880965 8M 600 741297 500 6M 600000 555465 400 4M 369145 400000 210634 2M 200000 00 7 300 200 100 0 16 32 64 12 8 256 51 2 Msg Size (bytes) 1 024 2048 4096 Msg/sec Throughput MB/sec

MRG scheduling resources http://www.youtube.com/watch?v=osm7ff8kkjk 8

MRG Messaging Infiniband RDMA Latency: Under 40 Microseconds Reliably Acknowledged MRG Messaging Latency Test on HP BL460c G6 Infiniband 100K Message Rate 0.0480 Average Latency (ms) 0.0460 0.0440 32 Bytes RDMA Nehalem 256 Bytes RDMA Nehalem 1024 Bytes RDMA Nehalem 0.0420 0.0400 0.0380 0.0360 0.0340 1 9 3 5 7 9 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

Components of the Solution Stack Solutions still matter in an industry-standard, open source world FSI-HPC Solution Stack Tuning & working in labs Red Hat MRG Tuning tools Red Hat MRG & RHEV Messaging/Grid/Virt HP Voltaire / Red Hat RDMA Red Hat MRG Realtime & KVM HP reduced SMI BIOS's HP compute & storage Application Environment Workload Middleware Integrated Systems Server Interconnect L2 Fabric Operating System BIOS X86-64 Server Architecture Determinism, and performance needs to work at each layer, HP & Red Hat are partnered across the stack 10 Services Red Hat / HP Systems Users

Hardware matters Scale-Up Blades Scale-Out Rack-Optimized SL6000 Today s RFP Metrics: Performance/Watt Performance/BTU Performance/Rack HP Low Latency Lab with MRG + Red Hat MRG Lab with HP BL460/BL685 & IB 11

Dealing with SMIs HP BIOS Option for Low Latency Apps Disable frequent SMIs used for Dynamic Power Savings Mode, CPU Utilization monitoring, P-state monitoring and ECC reporting Benefits both RHEL & MRG operating environments. Latency spikes with standard BIOS settings 12 Latencies when SMIs disabled in BIOS

MRG Realtime RHEL on HP systems Enables applications and transactions to run predictably, with guaranteed response times. Upgrades RHEL 5 to realtime OS Provides replacement kernel for RHEL5; x86/x86_64 Preserves RHEL Application Compatibility Certified on HP hardware, see Red Hat / HP certifications Response time Time 13

MRG Realtime Scheduling Latency Vanilla Min: Max: 1 2857 Mean: 11.47 Mode: 9.00 Median: 9.00 Std. Deviation: 54.94 MRG RT Min: Max: 4 43 Mean: 8.34 Mode: 8.00 Median: 8.00 Std. Deviation: 1.49 14

Networking matters Voltaire DDR and QDR InfiniBand: 36 QDR QSFP ports Ethernet mngt port LEDs USB port Serial port Test Configuration: Two Nehalem-based server w/ ConnectX PCI-E HCAs, back-to-back QDR ConnectX HCA running at QDR DDR ConnectX HCA running at DDR RHEL5 UPDATE 2 Mellanox VERBs Performance Test RoEE RDMA on Enhanced Ethernet RoEE is defined to be a verbs compliant IB transport running over the emerging IEEE Converged Enhanced Ethernet standard www.openfabrics.org/archives/spring2009sonoma/monday/grun.pdf 15

Building Cloud capabilities with MRG Scalable Virtualization Schedule VMs directly as jobs via libvirt Powerful Policies Provision VMs via Red Hat Enterprise Virtualization Inject jobs into VMs Resource Accounting SLA's 16 Track resources via Condor's resource accounting Apply priorities and policies Apply security Authentication (e.g. SSL, ), Integrity, Encryption VMs run multiple concurrent instances, start on Black Friday or semi-monthly, re-run after fault Machines only run VMs from owner s group between 9 and 5, everyone else has a low priority shot from 5 to 9 Global control limiters (e.g. NFS mount users, licenses), Various Cloud Services IaaS clouds: run all workloads as VMs PaaS clouds: leverage job scheduling with VM scheduling

Aggregating & Bridging Clouds MRG includes the ability to schedule jobs and applications to multiple clouds, based on policy 17 MRG has the ability to send VMs to other resource managers MRG becomes the unified interface to many types of resources internal VM resources and multiple external clouds MRG's life-cycle management, accounting and policy benefits still available Use cases include Manage overflow/spillover Access to specialized resource managers Transformation between VM types/systems Allow a single app/stack to bridge multiple clouds

MRG Cloud Aggregation Architecture Schedd accepts jobs over SOAP, AMQP, CLI GAHP: Grid ASCII Helper Protocol 18 An adapter to an external resource manager Exist for many batch systems Exists for EC2-like resource managers Extensible to new resource managers Job Router transforms types, e.g. stack to VM to EC2 AMI

Durable Messaging Throughput comparison MRG Durable Messaging Throughput Across Different Storage Types 70 0 0 0 0 Intel 16 CPU Harpertown 12GB memory 667 Memory speed Intel 82571EB Gigabit Ethernet HP IO Accelerator (Fusion-io) 32-byte messages 60 0 0 0 0 50 0 0 0 0 Message Rate 40 0 0 0 0 1 NIC 1 NIC Dura ble IO Fusio n Ca rd 1 NIC Dura ble Fibe r Disk 1 NIC Dura ble In te rn a l SCSI d rive 30 0 0 0 0 20 0 0 0 0 10 0 0 0 0 0 1 19 3 5 7 9 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 HP IO Accelerator

Durable Messaging Latency Comparison Latencytest with Durable Store Different Storage Types 1.400 Intel 16 CPU Hapertown 12GB memory 667 Memory speed Intel 82571EB Gigabit Ethernet HP IO Fusion 32-byte messages 1.200 Average Latency (ms) 1.000 0.800 1 NIC No Durable 1 NIC Iofusion Durable 1 NIC Fiber on durable 1 NIC Sata Durable 0.600 0.400 0.200 0.000 1 20 3 5 7 9 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 HP IO Accelerator

MRG template for HP Matrix HP BladeSystem Matrix enables: Automated provisioning to speed deployment Capacity planning to optimize workloads dynamically Disaster recovery simplified Red Hat + HP developing MRG template for Matrix: to quickly stand up 'internal cloud' deployments with workflows, scripts, and best practice templates www.hp.com/go/matrixtemplates 21

Testing and developing solutions working together Delivered in reference papers & certifications Throughput Memory Usage Red Hat / HP White Paper: 74 72 70 cache buff free 68 66 64 62 60 1-GigE 22 10-GigE IPoIB IB SDP IB RDMA

Additional Information www.redhat.com/mrg www.hp.com/go/realtimelinux 23

THANK YOU 24