1
OpenVMS 8.4 on Tukwila Quad Core 9300 Blazing Performance Prashanth K E, Technical Architect, OpenVMS Lab 24th June 2010 k.e.prashanth@hp.com your infrastructure like a visionary, not a functionary.
Agenda Today s Mission-Critical challenges Mission-Critical Converged Infrastructure HP s Server Strategy Introduction to Intel Itanium 9300-series processor based servers BL8x0c i2 Configurations Supported by OpenVMS BL8x0c i2 Performance Features and OpenVMS Test Results OpenVMS Support for RAS features on BL8x0c i2 OpenVMS Power Management Support Q & A 3
Mission-Critical Customer Challenges Financial Services Every minute of downtime = a minute of lost revenue! Manufacturing and Distribution Production comes to grinding halt Healthcare Patient outcomes depend on 24x7 access to data Public Sector and CME Customer retention and fraud detection at risk No tolerance for downtime Increasing SLAs with decreasing budgets Islands of legacy apps and monolithic systems 4
The First Mission-Critical Converged Infrastructure New Integrity systems optimized for the converged infrastructure Storage HP Converged Infrastructure Servers Power & cooling Network Management software A common, modular architecture that simplifies, consolidates, and automates everything A mission-critical infrastructure delivering the highest levels of reliability and flexibility 5
Introducing the Revolutionary Blade Scale Architecture The first mission-critical Converged Infrastructure on the industry s #1 blade platform Unified blade architecture from x86 to Superdome Simplify by consolidating applications on a common platform FlexFabric Flexibly scale resources to any workload Always-on resiliency 100+ innovations to ensure global business continuity Matrix operating environment Instantly adjust infrastructure to business demands 6
Unified Blade Architecture from x86 to Superdome Simplify by consolidating applications on a common platform ProLiant Integrity Systems BladeSystem Matrix Integrity NonStop NEW NEW NEW NEW c3000 Superdome 2 2s 4s 8s Superdome enclosure cell blade c7000 Integrity NonStop BladeSystem Blades Enclosures Intel and AMD x86 2-64s Train once, certify once, deploy once Up to 4,096 nodes 7
FlexFabric Scale resources to meet any workload demand Within the system Blade Link Linear scaling from 2-8 sockets Across the network Dynamically managed Crossbar Fabric Fault-tolerant flexible scaling Virtual Connect Wire-once, changeready connectivity Matrix operating environment, delivered by Insight Dynamics 8
Always-on Resiliency 100+ new innovations to ensure global business continuity Fault Tolerant Crossbar fabric with E2E retry Redundant, hot swap I/O interconnect modules Memory DDDC + 1 SBE Passive XBAR Backplane On-line FW upgrades Register file and TLB parity Redundant hot swap clocks Fabric retry and lane sparing Electrically isolated partitions Poison and viral error containment Memory addr/ctl Parity PCI advanced error handling Dynamic clock fail-over 2010s Memory double chip sparing Cell OLAD Automatic Processor Recovery PCI OLAR Memory chip-kill ECC Dual grid capable 2000s N+1 hot swap Fans Memory single-bit ECC Page de-allocation Dual path I/O N+1 hot swap AC/DC Dynamic Memory Resiliency Serviceguard 1990s Redundant manageability Blade OLAD QPI CRC, retry, link width reduction PCI soft-fail N+1 VRMs FBD CRC, retry and spare lane XBAR OLR ServiceGuard Storage Mgt. Intel Cache Safe Technology Dynamic Processor Resiliency Hot swap I/O 2N AC/DC 3 Data Center DR ServiceGuard for RAC Enterprise HP-UX no MP-SPOF 9
Always-on Resiliency 100+ new innovations to ensure global business continuity Fault Tolerant Crossbar fabric with E2E retry Redundant manageability no MP-SPOF Redundant, hot swap I/O interconnect modules Memory DDDC + 1 SBE PCI advanced error handling Passive XBAR Backplane Dynamic clock fail-over QPI CRC, retry, link width reduction On-line FW upgrades Memory double chip sparing PCI soft-fail 3 Data Center DR Register file and TLB parity Cell OLAD N+1 VRMs ServiceGuard for RAC Redundant hot swap clocks Automatic Processor Recovery ServiceGuard Storage Mgt. Fabric retry and lane sparing PCI OLAR Intel Cache Safe Technology Virtualization resiliency Electrically isolated partitions Memory chip-kill ECC Enterprise HP-UX Poison and viral error containment Dynamic Processor Resiliency Memory addr/ctl Parity Dual grid capable Hot swap I/O 2000s Single-system resiliency N+1 hot swap Fans Memory single-bit ECC Page de-allocation Dual path I/O N+1 hot swap AC/DC Dynamic Memory Resiliency Serviceguard 1990s Blade OLAD FBD CRC, retry and spare lane Converged Infrastructure resiliency 2010s XBAR OLR 2N AC/DC 10
Matrix Operating Environment Common management delivered by HP Insight Dynamics Provision infrastructure in minutes Optimize workloads confidently Reduce data center and energy costs New for HP-UX: Automated orchestration for physical blades Only for HP-UX: Most comprehensive automated workload management HP ranked #1 Optimized energy usage without compromising performance or reliability 11
HP s Server Strategy 12
Integrity value redefines the data center with A future business critical infrastructure 2010 Future More modular design Cellular Mid-range & High-end System More common components Today Growing family of bladed systems Low-cost highly-capable entry-class systems More flexible and customizable architecture More performance Enhanced robustness and fully virtualized Delivering improved data center economics 13
Integrity value redefines the data center with A future business critical infrastructure Today Growing family of bladed systems Cellular Mid-range & High-end System Low-cost highly-capable entry-class systems New Processor: 9300 series 2010 More reliability and business criticality with: Superdome-Gen2 2-64s SMP server More modular design More common components More flexible and customizable architecture Supported in C7000 & C3000 chassis 2-8S SMP Server Blades More performance New Processor: Poulson New Processor: Kittson Enhanced robustness and fully virtualized 2s/2U rack mount server Future Delivering improved data center economics 14
Introduction to Intel Itanium 9300-series processor based servers 15
BL8x0c i2 Overview Processors and chipset Intel Itanium 9300-series processors Intel E7500 Scalable Memory Buffer Intel E7500 IOH Intel ICH10 south bridge Form factor Full Height c-class form factor Single wide BL860c i2 2s Double wide BL870c i2 4s Quad wide BL890c i2 8s Supported in c3000 and c7000 enclosures Memory (per blade) ~6X memory BW increase over previous generation 24 PC3-8500 DIMM sockets 192 GB capacity per blade with 8GB DIMMs I/O subsystem (per blade) Additional IO options 5X IO BW vs. previous generation Two hot-plug SFF SAS HDDs Integrated p410i RAID controller per blade 2 dual-port 10GbE Flex-10 NICs; VC Partner blade support support 3 PCIe G2 mezz slots High availability Enhanced processor RAS features Memory double chip spare Internal SAS RAID Enhanced interconnect RAS Power Management Enhanced Demand based switching Turbo boost Management Integrated Lights Out (ilo3) Integrated VGA console ilo 3 Advanced Pack firmware c-class Onboard Administrator Operating Systems Support HP-UX 11i v3 OpenVMS 8.4 (Aug 10) Windows 2008 R2 (to follow) 16
Intel Itanium 9300-series Platform Overview Connectivity Fully-connected Intel Quick Path Interconnect (QPI) links Intel E7500 chipset Memory CPU-integrated Memory Controller Intel E7500 Memory Expanders DDR3-RDIMMs Enabling Technologies Advanced Power Management Intel Virtualization Technology Advanced RAS New SI reliability features New Interconnect Reliability features Memory RAS (Sparing, migration ) DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM MB MB DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM MB MB 9300 series processor QPI Intel E7500 IOH PCIe Gen2 PCIe Cards DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM MB MB MB MB 9300 series processor DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM DDR3 DIMM ICH10 Gen1 Core I/O 17 2009 HP Confidential template rev. 12.10.09 @2010 Hewlett-Packard Company L.P. All Rights Reserved.
Scaleable Blade Link Linear scalability with industry s first 2-4-8 socket server blades Blade Link combines multiple blades into a single, scalable system Scale Up, Out and Within Scale More Only 8-Socket blade in industry standard blade enclosure CPU Memory LAN HDDs 2s/8c 96GB 4 x 10GbE 2 Slots X 2 = 4s/16c 192GB 8 x 10GbE 4 Slots X 2 = 8s/32c 384GB 16 x 10GbE 8 Slots Scale Linear System resources grow evenly across CPU, memory, I/O, and etc 18 NDA Under embargo until 4/27/10 8 socket system at 2x the performance in half the footprint
Migration from Current Integrity Server Products rx2660 2-socket BL860c rx3600 BL860c i2 4-socket BL870c rx6600 BL870c i2 8-socket BL890c i2 19 rx7640
rx2800 i2 2p/8c RACKMOUNT Value and Key Features Coming soon!!! rx2800 i2 Processor Memory Internal Storage Networking I/O Slots Management Form Factor Up to two Dual-Core or Quad-Core Intel Itanium processors Industry standard DDR3 technology 24 PC3-8500 DIMM sockets 192GB max (with 8GB DIMMs) 8 Hot-Plug SFF Serial Attached SCSI HDDs 1 CD+RW or DVD+RW Dual Integrated Gigabit Ethernet ports Manageability LAN 6 PCI-E slots (2 x8, 4 x4 slots) Management Processor with ilo3 (Integrated Lights Out functionality) 2U Rack Mount Server (office tower conversion kit available) Key Points: Affordable, 8-core scalable entry-level non-x86 server High Density Compute Server (2U footprint) Excellent memory capacity Continued innovations in RAS Ease to deploy into today s racked environments Office Friendly pedestal option N+N redundant power and cooling 20
BL8x0c i2 configurations supported by OpenVMS 21 21
Supported configurations for OpenVMS Supported BL860c i2, BL870c i2, BL890c i2 LAN, FC pass thru and switches c3000, c7000 enclosures Core I/O SAS disks (RAID mode) Network NICs 10 GigE LOM 10 Gbps mezz 1 Gbps quad-port (NC364m) 1 Gbps dual-port (NC360m) Fibre Channel HBA 8 Gbps dual-port FC (Q-logic) 3 Gbps external SAS P700m Supported OpenVMS guest, HPVM V4.2 PK1 HP Insight Control 6.1 vmedia, DVD (internal, USB) MDS600, P2000G3,MSA2000G2 Not Supported in Aug 10 relase 8GB DIMMs 8 Gbps FC HBA from Emulex Core I/O SAS HBA mode Virtual Connect, Flex10 FCoE 22 Only Major Items mentioned vflash, USB flash disk
BL8x0c i2 Performance Features 23 23
Performance Enhancing Factors on BL8x0c i2 servers Enhanced thread level parallelism Double the number of cores Enhanced hyper thread management Directory-Based Cache Coherency 9100 series processors used snooping mechanism for cache coherence Increase in coherency traffic with increase in number of processors Contention for bandwidth 9300 series uses directory based cache coherency Home agent for each memory controller Track owners and shares of a given cache line 24
Performance Enhancing Factors on BL8x0c i2 servers Memory Memory controllers integrated with processor (2 per processor) rx6600-1 memory controller for 4 sockets BL870c i2 8 dual port memory controllers for 4 sockets DDR3 memory Support for larger memory configurations Point-to-Point connections QuickPath Interconnect Higher bandwidth between processors and IO 25
Performance Enhancing Factors on BL8x0c i2 servers IOH (Intel E7500) Connects CPU to PCIe gen2 IO (core and mezz) Larger number of PCIe gen2 lanes increase IO throughput Data TLB support for 8K and 16K pages Faster translations via the TLB Resource Affinity Domains 5 standard RAD configurations Provides users the flexibility to define how memory layout optimize for your application 26
OpenVMS Performance Test Results 27
Price performance migration path Cores 8-Socket New Integrity blades: Blades 16-core rx7640 4-Socket 8-core BL890c i2 rx6600 BL870c i2 2-Socket BL870c rx3600 2-Socket rx2660 BL860c BL860c i2 New Integrity Blades are more scalable than previous generation Double the number of cores More memory More I/Os 28
TPS sec/txn Rdb Tests Rdb Performance Load Tests OpenVMS V8.4 Rdb Performance (More is Better) Rdb Performance (Less is Better) 300 1 250 0.8 200 150 100 50 0.6 0.4 0.2 0 0 rx8640 (1.60GHz/12.0MB) BL890c-i2 (1.60GHz/6.0MB) rx8640 (1.60GHz/12.0MB) BL890c-i2 (1.60GHz/6.0MB) BL890c i2 performs 2x better than rx8640 The Transaction Per Second (TPS) is 2x The time taken per transaction is reduced to half on i2 server Rdb takes advantage of hyper-threading on i2 servers 29
TPS sec/txn Rdb Tests Rdb Performance Load Tests OpenVMS V8.4 2X performance improvement with HT enabled Rdb Performance (More is Better) Rdb Performance (Less is Better) 300 1 250 0.8 200 150 100 50 0.6 0.4 0.2 0 0 rx8640 (1.60GHz/12.0MB) BL890c-i2 (1.60GHz/6.0MB) rx8640 (1.60GHz/12.0MB) BL890c-i2 (1.60GHz/6.0MB) BL890c i2 performs 2x better than rx8640 The Transaction Per Second (TPS) is 2x The time taken per transaction is reduced to half on i2 server Rdb takes advantage of hyper-threading on i2 servers 30
Load load Load Apache Performance Apache Bench Tests on OpenVMS V8.4 Bandwidth (More is better) Throughput ( More is Better) Time Taken (sec) (Less is Better) 250 800 120 200 150 100 50 600 400 200 100 80 60 40 20 0 0 0 BL860c (1.59GHz/9.0MB) BL860c (1.59GHz/9.0MB) BL860c (1.59GHz/9.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 delivered 2x performance compared to BL860c 31
Load load Load Apache Performance 2X performance improvement Apache Bench Tests on OpenVMS V8.4 Bandwidth (More is better) Throughput ( More is Better) Time Taken (sec) (Less is Better) 250 800 120 200 150 100 50 600 400 200 100 80 60 40 20 0 0 0 BL860c (1.59GHz/9.0MB) BL860c (1.59GHz/9.0MB) BL860c (1.59GHz/9.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 (1.73GHz/6.0MB) BL860c-i2 delivered 2x performance compared to BL860c 32
Operation Rate Operation Rate Java Workload Tests Native Java Tests on OpenVMS V8.4 Java Workload Java Workload 140000 140000 120000 120000 100000 80000 60000 40000 20000 0 0 2 4 6 8 10 12 14 16 18 100000 80000 60000 40000 20000 0 8 9 10 11 12 13 14 15 16 Threads Threads rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB) rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB) Java Workloads scale up better on i2 Servers Java Workloads are high CPU and Memory Intensive 33
Operation Rate Operation Rate Java Workload Tests Native Java Tests on OpenVMS V8.4 Java Workload 870c i2 scaled up much better 2X performance improvement at higher workloads Java Workload 140000 140000 120000 120000 100000 80000 60000 40000 20000 0 0 2 4 6 8 10 12 14 16 18 100000 80000 60000 40000 20000 0 8 9 10 11 12 13 14 15 16 Threads Threads rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB) rx6600 (1.59GHz/12.0MB) BL870c i2 (1.60GHz/5.0MB) Java Workloads scale up better on i2 Servers Java Workloads are high CPU and Memory Intensive 34
Memory tests Throughput Memory Bandwidth (more is better) Single stream test shows 55% improvement in memory bandwidth between rx3600 and BL860c i2 3500 3000 2500 2000 1500 1000 500 rx3600 (1.67GHz/9.0MB) BL860c-i2 (1.73GHz/6.0MB) Single Stream Test 0 MB/Sec The new BL8x0c i2 server demonstrated 55% improvement in single stream test Memory bound applications would benefit from aggregated bandwidth 35
CPU Ratings Integer Test Rating Ratings (More is better) 1.6 GHz 9300 series cores show 4% improvement over 9100/9000 cores 1.73 GHZ 9300 series cores show 15% improvement 1000 900 800 700 600 500 400 300 200 100 0 These numbers are per Core (within a processor/socket) As the frequency increases, we see a increase in rating 9300 - BL8x0c-i2 (1.73GHz/6.0MB) 9300- BL8x0c-i2 (1.60GHz/6.0MB) 9300 - BL8x0c-i2 (1.33GHz/4.0MB) 9000 - BL860c (1.59GHz/9.0MB) 9100 - rx7640 (1.60GHz/12.0MB) Per core numbers CPU Bound applications should benefit (database queries), specifically integer computational bound applications 36
CPU Ratings Whetstone is FP Computation Tests MWIPS Rating (more is better) 1.6 GHz 9300 series cores show 8% improvement over 9100/9000 cores 1.73 GHZ 9300 series cores show 17% improvement 2500 2000 1500 1000 500 0 These numbers are per Core (within a processor/socket) Fast response to complex operations; Scientific, Automation and robotic applications should benefit 9300 - BL8x0c-i2 (1.73GHz/6.0MB) 9300 - BL8x0c-i2 (1.60GHz/6.0MB) 9300 - BL8x0c-i2 (1.33GHz/4.0MB) 9000 - BL860c (1.59GHz/9.0MB) 9100 - rx7640 (1.60GHz/12.0MB) Per core numbers 37
Performance Tests in Progress IO tests Oracle tests 38
OpenVMS V8.4 Performance Improvements Integrity RAD Support Support for TCP/IP Packet Processing Engine (PPE) Gains of approximately 5% Solid State Disks Vendor benchmarks Internal testing with EVA Clustering Software Enhancements PE driver Benefits deployments that use multiple channels Improvement of 50% observed in some use cases Dedicated Lock Manager Improved Request buffer handling Results in 2x improvement in some cases Applications like Relational databases are benefitted Shadow Driver Improvement of 10-12% observed in some use cases 39
OpenVMS V8.4 Performance Improvements General Operating System Performance enhancements Enhanced memcmp and strcmp for Inetrgity systems Image Activation Performance gains of 40-50% Application performing lots of image activations gain VA tear down Batch jobs, creating and stopping processes improve Global Section unmap improvements Exception Handling (on par with Alpha now) InnerMode semaphore upcalls for Exec and Kernel mode System service dispatch enhancements Pthread spinlock algorithm changes Dynamic enabling of XFC cache for mounted volumes PageDyn LALs 40 2009 HP Confidential template rev. 12.10.09
OpenVMS Support for RAS features on BL8x0c i2 41 41
Intel Itanium 9300-series RAS Features Enhanced reliability, availability and serviceability (RAS) over the 9100 series Extensive capabilities to detect, recover and report errors RAS features for Processor/Socket Memory Interconnect and Miscellaneous IOH and Partitions 42
Intel Itanium 9300-series Processor RAS Features Error avoidance, detection and correction across all core structures Soft error hardened latches and registers Designed to improve resistance to soft errors by upto 100X Error Correcting Code (ECC) or parity Widely used algorithms implemented in hardware to monitor errors that can occur during transmission 43
Intel Itanium 9300-series Processor RAS Features Intel Cache Safe Technology Mapout bad cache lines using heuristics Cache Data is automatically scrubbed for single bit errors 9300 series covers L2 cache and directory cache as well as opposed to only L3 cache in 9100 Advanced Machine Check Many errors that were fatal in 9100 are now correctable OpenVMS logs correctable errors in the error logs Dynamic Processor Resilience (DPR) Support for CPU indictment with help of WEBES 44
Intel Itanium 9300-series Memory RAS Features Enhanced ECC protection ECC mechanism to detect and fix errors in attached memory components Support for Single and Double Device Data Correction Memory Thermal Protection Support for Closed and Open Loop throttling mechanisms to reduce failures due to over heating of components Monitor memory for errors Memory sparing On reaching a threshold firmware copies data from DRAM to a spare Transparent to the Operating System 45
Intel Itanium 9300-series Interconnect RAS Features QPI Error Detection and Correction CRC used to detect errors Transactions retried Channel Physically reset Bad lanes can be mapped out (can impact performance) Intelligent error management Single dropped packet can have a cascade effect causing whole lot of errors Sort and analyze the packet to determine the source 46
OpenVMS Power Management Support 47
Power Management Enhanced Demand Based Switching Ability to operate at reduced voltage and frequency until more power is needed Can run different P-States Predefined voltage and frequency combinations at which a processor can operate correctly Intel Turbo Boost 48 Processor can run at frequencies higher the advertised frequency Provided processor package is below rated power, thermal and current limits If you have an idle core the active core can use its headroom to get a boost in frequency
OpenVMS Power Management Highest Performance Mode Use Turboboost to maximize performance No power savings measures Dynamic Power Mode Use Turboboost when process is executing Use C-states (halt states) to save power when a CPU is idle Low Power Mode Use p-states to reduce power usage when process is executing Use C-states to save power when a CPU is idle OS Control Mode Enable a system service or sysgen parameter to select among power mode 49
50 Q&A
Questions/Comments Business Manager (Vivasvan Shastri) viva@hp.com Office of Customer Programs OpenVMS.Programs@hp.com 51
References and Acknowledgements Intel Whitepapers http://download.intel.com/products/processor/itanium/323247.pdf http://download.intel.com/products/processor/itanium/318691.pdf http://software.intel.com/sites/oss/pdfs/power_mgmt_intel_arch_servers.pdf http://www.intel.com/design/itanium/documentation.htm WEBES Documentation http://h18023.www1.hp.com/support/svctools/webes/index.html?jumpid=reg_r10 02_USEN HP Whitepapers OpenVMS Technical Journal 14 article on OpenVMS Power Management by Prithvi Srihari, Burns Fisher and Veena K 52
53 Thank You