OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM
Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack innovation for Big Data and Analytics, Cloud and ISVs Google, nvidia, Tyan, Mellanox, Micron, Samsung, Canonical, POWERCORE WebSphere, DB2, Cognos, Watson, Tivoli, Rational, Platform Red Hat, SUSE and Ubuntu distributions Docker, OpenStack, KVM, OpenCompute, NoSQL Databases 30+ reference configurations for solutions 250+ members 200+ applications 2500+ Linux ISVs developing on Power 100,000+ open source packages 2
Faster memory access: S822LC delivers data from memory 2.2X faster than Intel Haswell when fully populated with DIMMs Based on STREAM Triad memory bandwidth when fully configured Deliver 2.2X more memory bandwidth with S822LC versus Intel Haswell (E5-2600 v3) STREAM Triad (GB/sec) 200 180 160 140 120 100 80 60 40 20 0 189 POWER8 IBM S822LC 20c/160t Intel Server System E5-2690 v3 24c/48t 85 x86 IBM Power System S822LC results are based on IBM internal measurements of STREAM Triad; 20 cores / 20 of 160 threads active, POWER8; 3.5GHz, up to 1TB memory, Intel Xeon data is based on published data running STREAM Triad; 24 cores / 24 of 48 threads active, E5-2390 v3; 2.3GHz up to 1.5 TB memory. For more details see http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-2600-v3/xeon-e5-2600-v3-stream.html 3
Adding 2 NVIDIA Tesla K80 GPUs to IBM Power S822LC delivers up to 6.7X better performance on NAMD code Faster time to insight and reduced operating costs with fewer systems 8 7 Accelerate performance and reduce operating costs in biomolecular research Relative Performance 6 5 4 3 2 1 0 APOA1 F1ATPase STMV S822LC / 16c / 3.3 GHz S822LC / 16c / 3.3GHz / 2xK80 Results are based on IBM internal testing of systems running NAMD version 2.10 APOA1, F1ATPASE, STMV code benchmarked on POWER8 systems installed each with 2 NVIDIA Tesla K80 GPUs.. Individual results will vary depending on individual workloads, configurations and conditions. IBM Power System S822LC; 16 cores / 128 threads, POWER8; 3.3GHz, 128 GB memory IBM Power System S822LC; 16 cores / 128 threads, POWER8; 3.3GHz, 128 GB memory, 2 NVIDIA K80 GPUsC 4
IBM Power S822LC with NVIDIA Tesla K80s outperforms Xeon E5-2600 v3 with NVIDIA Tesla K80s for NAMD by up to 37% IBM Power S822LC delivers superior results for NAMD IBM Power S822LC is a superb platform for users of NAMD molecular dynamics package Relative Performance 1.5 1 0.5 0 GPU Accelerated NAMD Performance, IBM Power S822LC vs Haswell-EP 1.31 1.37 1.16 APOPA1 F1ATPASE STMV Xeon E5 v3 Host, 16-cores + 2x NVIDIA Tesla K80 IBM Power S822LC, 16-cores + 2x NVIDIA Tesla K80 Results are based on IBM & NVIDIA internal testing of systems running NAMD version 2.10 APOA1, F1ATPASE, STMV code; Compilation: CUDA 7.0.28, ICC 15.1.133, MKL 11.2.1 Individual results will vary depending on individual workloads, configurations and conditions. Supermicro 2028GR-TRT, 16 cores, x86, 2.3GHz, 128GB memory, 2 NVIDIA K80 GPUs IBM Power System S822LC, 16 cores / 128 threads, POWER8, 3.3GHz, 128GB memory, 2 NVIDIA K80 GPUs 5
With More: POWER8 with NVLink: 2.5x Faster CPU-GPU Connection HBM GPU HBM PCIe 32GB/s System bottleneck CPU DDR4 GPU NVLink 80 GB/s GPU POWER8 DDR4 GPUs Limited by PCIe Bandwidth From CPU-System Memory HBM NVLink Enables Fast Unified Memory Access between CPU & GPU Memories 6
Better Design: Flat and Fat System is engineered both flat and fat Data flows freely across system Nearly as broad from CPU: GPU as System Memory: CPU Big pipes between GPUs on the same socket DDR4 115GB/s CPU I B Fabric I B CPU 115GB/s DDR4 Addresses PCI-E Bottleneck for numerous usage models Burst at startup/teardown Stream data constantly Host-Device Constant Transfers between 2 GPUs Hidden Bus Transfers from Host- Device (due to insufficient BW) GPU NVLink GPU GPU NVLink GPU 80 GB/s 80 GB/s 7
POWER8 with NVLink Out-Acclerates Xeon E5-2600 V4 with PCIe Attached GPU IBM Power S822LC delivers 2.6X Queries per Hour POWER8 with NVLink has superb acceleration 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 KINETICA Queries per Hour (Filter=by-geographic area) Power S822LC for HPC Xeon E5-2640 V4 Power S822LC for HPC Xeon E5-2640 v4 Competitor 20-cores 20-cores (2) IBM POWER8 with NVLink, 2.86 Ghz, 20-cores, 160 threads (2) Xeon E5-2640 v4 @ 2.40GHz, 20-cores 1024 GB memory 512 GB memory (3) 3.84 TB 2.5" 6 Gbps SSD (2) 800 GB Intel SSD DC S3510 Series 2.5" 6 Gb SSD (4) NVIDIA Tesla P100 with NVLink (GPU) (4) NVIDIA Tesla K80 (GPU) NVLink PCIe Gen3 Ubuntu 16.04.1 LTS Ubuntu 16.04 LTS CUDA 8.0 CUDA 8.0 All results are based on running Kinetica Filter by geographic area queries on data set of 280 million simulated Tweets with 1 up to 80 simultaneous query streams each with 0 think time. 8
Resources and Support for Linux Developers IBM PartnerWorldTechnical Support IBM Innovation Centers Free access to Power Hardware Free porting assistance Free Eclipse-based development environment www.ibm.com/partnerworld/wps/servlet/contenthandler/pw_com_pwp_partnerworldprogram IBM Migration Factory Premier migration services for large applications http://www-03.ibm.com/systems/services/labservices/migrationfactory IBM Watson Developer s Cloud Access to IBM Watson for developing cognitive computing applications http://www.ibm.com/watson/developercloud/ IBM Power Development Cloud Provide free access to Power hardware to ISVs for Porting www.ibm.com/partnerworld/wps/servlet/contenthandler/stg_com_sys_powerdevelopment-platform IBM DeveloperWorks Technical resources, community, blogs, toolkits, How to articles, beta code www.ibm.com/developerworks/linux/ Regional Ecosystem Initiative Recruiting Key Solutions Greater China, North America, Europe Middleware and Industry Solutions IBM Innovation Centers All 50+ centers worldwide now support Linux on Power One-stop for ISVs, developers HW access, technical support, demos, toolkits, Hands-on labs www.ibm.com/systems/power/software/linux/centers Site Ox On-demand cloud-based development platform using Linux on POWER8 www.siteox.com 9
Performance resources for Linux on Power Advanced Toolchain Power Optimized GCC Power Optimized runtime libraries Power SDK Programming Framework Performance profiler Performance guidance IBM XL Compilers High Performance C/C++ and Fortran Compilers IBM Java High Performance Java 10
NVIDIA IBM Acceleration Lab Early Access to POWER8 with NVLink Technology Run on first & only systems with CPU-GPU NVLink Immediate performance gains from the wider bus and Tesla P100 Team up with IBM, NVIDIA on Advanced Acceleration Deep technical resources Custom plan to help migrate and optimize code together Unlock What was Previously Impossible Bring new applications with unified memory & easier data movement Apply for the program at: ibm.biz/accellab Email for more information: accellab@us.ibm.com
The Acceleration Lab Supports All Kinds of Clients and Goals Advanced Acceleration Linux on Power, and GPU accelerated Needs: Performance optimization for NVLink Result: Optimized Throughput Performance Going Parallel Linux on Power and not GPU accelerated Needs: GPU acceleration Result: Ready for Advanced Acceleration Getting to Power x86 Linux, already GPU accelerated Needs: Linux on Power port, benchmarking Result: Ready for Advanced Acceleration Starting From Scratch x86 Linux, no GPU acceleration Needs: Power LE Port OR GPU Acceleration Result: Ready for Going Parallel or Getting to Power IBM Systems