FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14 September, 2009 1
From simulation to trade Scale up Grid Internal pool Messaging scheduler Messaging Scale out Another internal division trader Latency trade External resource Eg EC2 2
Red Hat Enterprise MRG Integrated platform for high performance distributed computing High speed, interoperable, open standard Messaging Deterministic, low-latency Realtime kernel High performance & throughput computing Grid scheduler for distributed workloads and Cloud computing 3
AMQP, HP Performance, scale up. 12000000 Single HP Nehalem BL460c 40G Infiniband AMQP Perftest 10000000 Messages/Sec 8000000 6000000 4000000 8 bytes 64 Bytes 256 Bytes 1024 Bytes 2000000 0 8 Broker 4 Broker 2 Broker 1 Broker Number of Brokers on the Server two Intel(R) Xeon(R) CPU X5570 @ 2.93GHz per blade (Nehalem 2.93 GHz, 8MB L3 cache, 95W) Memory 24GB(6x4GB), Memory Type DDR3-1333, HT, Turbo 2/2/3/3) Infiniband 4X QDR IB Dual-port Mezzanine HCAs(1 port connected) Infiniband Switch BLc 4X QDR IB Switch 4
AMQP Messaging on 8-node HP Nehalem Infiniband 40Gps > 11 M mes/s 7000000 3.1 3.1 3.1 3 6000000 2.5 5000000 Messages/Sec 4000000 3000000 2000000 2 1.5 1 Nehalem Harperton % Nehalem vs Harperton 1000000 0.5 0 4 Broker 2 Broker 1 Broker Number of Brokers per Server 0 5
KVM Performance AMQP Messaging Intel Nahalem 2 10Gbit Vt-D > 1 M mes/s RHEL 5.4 KVM AMQP 2-Guest 1200000 900 104 6081 1023869 800 1000000 902689 880965 700 Messages / Sec 800000 600000 400000 200000 804045 74 1297 5554 65 369145 210634 600 500 400 300 200 Msg/sec Throughput MB/sec 100 0 16 32 64 128 256 512 1024 2048 4096 Msg Size (bytes) 0 6
MRG Messaging Infiniband RDMA Latency: Under 40 Microseconds Reliably Acknowledged 0.0480 MRG Messaging Latency Test on HP BL460c G6 Infiniband 100K Message Rate 0.0460 Average Latency (ms) 0.0440 0.0420 0.0400 0.0380 32 Bytes RDMA Nehalem 256 Bytes RDMA Nehalem 1024 Bytes RDMA Nehalem 0.0360 0.0340 1 3 5 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 67 71 75 79 83 87 91 95 99 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 7
Components of the Solution Stack Solutions still matter in an industry-standard, open source world Tuning & working in labs Red Hat MRG Tuning tools Red Hat MRG Messaging / Grid Red Hat / HP Systems HP Voltaire / Red Hat RDMA Red Hat MRG - Realtime HP reduced SMI BIOS's HP compute & storage FSI-HPC Solution Stack Users Application Environment Workload Middleware Integrated Systems Server Interconnect L2 Fabric Operating System BIOS X86-64 Server Architecture Services Determinism, and performance needs to work at each layer, HP & Red Hat are partnered across the stack 8
Hardware matters Scale-Up Blades Scale-Out Rack-Optimized SL6000 Today s RFP Metrics: Performance/Watt Performance/BTU Performance/Rack HP Low Latency Lab with MRG + Red Hat MRG Lab with HP BL460/BL685 & IB 9
Dealing with SMIs HP BIOS Option for Low Latency Apps Disable frequent SMIs used for Dynamic Power Savings Mode, CPU Utilization monitoring, P-state monitoring and ECC reporting Benefits both RHEL & MRG operating environments. Latency spikes with standard BIOS settings Latencies when SMIs disabled in BIOS 10
MRG Realtime RHEL on HP systems Enables applications and transactions to run predictably, with guaranteed response times Upgrades RHEL 5 to realtime OS Provides replacement kernel for RHEL5; x86/x86_64 Preserves RHEL Application Compatibility Certified on HP hardware, see Red Hat / HP certifications Response time Time 11
MRG Realtime Scheduling Latency Vanilla Min: 1 Max: 2857 Mean: 11.47 Mode: 9.00 Median: 9.00 Std. Deviation: 54.94 MRG RT Min: 4 Max: 43 Mean: 8.34 Mode: 8.00 Median: 8.00 Std. Deviation: 1.49 12
Networking matters Voltaire DDR and QDR InfiniBand: 36 QDR QSFP ports Ethernet mngt port LEDs USB port Serial port Test Configuration: Two Nehalem-based server w/ ConnectX PCI-E HCAs, back-to-back QDR ConnectX HCA running at QDR DDR ConnectX HCA running at DDR RHEL5 UPDATE 2 Mellanox VERBs Performance Test RoEE RDMA on Enhanced Ethernet RoEE is defined to be a verbs compliant IB transport running over the emerging IEEE Converged Enhanced Ethernet standard www.openfabrics.org/archives/spring2009sonoma/monday/grun.pdf 13
MRG Grid Provides leading high performance & high throughput computing: Brings advantages of scale-out and flexible deployment to any application or workload Delivers better asset utilization, allowing applications to take advantage of all available computing resources Enables building cloud infrastructure and aggregating multiple clouds: Integrated support for virtualization as well as public clouds Seamlessly aggregates multiple cloud resources into one compute pool Provides seamless and flexible computing across: Local grids Remote grids Private and hybrid clouds ( EC2 Public clouds (Amazon Cycle-harvesting from desktop PCs 14
Based on Condor and Includes: Enterprise Supportability From Red Hat Web-Based Management Console Unified management across all of MRG for job, system, license management, and workload management/monitoring Low Latency Scheduling Enable job submission to Condor via AMQP Messaging clients Enable sub-second, low-latency scheduling for sub-second jobs Virtualization Support via libvirt Integration Support scheduling of virtual machines on Linux using libvirt API's Cloud Integration with Amazon Ec2 Enable automatic cloud provisioning, job submission, results storage, teardown via Condor scheduler Extensible, it can be a dependency for other jobs or executed based on rules (e.g. add capacity in in the cloud if local grid out of ( capacity Concurrency Limits Set limits on how much of a certain resource (e.g. software licenses, db connections) can be used at once Dynamic Slots Mark slots as partitionable and sub-divide them dynamically so that more than one job can occupy a slot at once 15
Testing and developing solutions working together...delivered in reference papers & certifications Throughput Memory Usage Red Hat / HP White Paper: 74 72 70 68 cache buff free 66 64 62 60 1-GigE 10-GigE IPoIB IB SDP IB RDMA 16
Additional Information www.redhat.com/mrg www.hp.com/go/fsi 17