OpenPOWER Performance

Similar documents
OpenPOWER Performance

Building NVLink for Developers

IBM Power AC922 Server

IBM Power Advanced Compute (AC) AC922 Server

IBM CORAL HPC System Solution

NAMD GPU Performance Benchmark. March 2011

Revolutionizing Open. Cecilia Carniel IBM Power Systems Scale Out sales

Dr Mandie Quartly OpenPOWER Global Alliances, Europe,

IBM POWER SYSTEMS: YOUR UNFAIR ADVANTAGE

IBM Power Systems HPC Cluster

IBM Deep Learning Solutions

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

MandieQuartly,Ph.D.

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

Deep Learning mit PowerAI - Ein Überblick

n N c CIni.o ewsrg.au

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

2016 IBM Corporation 1

World s most advanced data center accelerator for PCIe-based servers

IBM Power User Group - Atlanta

NAMD Performance Benchmark and Profiling. January 2015

IBM Power Systems: Open Innovation to put data to work. Juan López-Vidriero Mata Director técnico de ventas de servidores

POWER8 for DB2 and SAP

IBM Leading High Performance Computing and Deep Learning Technologies

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

Interconnect Your Future

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization

IBM s Data Warehouse Appliance Offerings

TESLA V100 PERFORMANCE GUIDE. Life Sciences Applications

DGX UPDATE. Customer Presentation Deck May 8, 2017

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

Accelerating Data Center Workloads with FPGAs

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Building the Most Efficient Machine Learning System

Mapping MPI+X Applications to Multi-GPU Architectures

Building the Most Efficient Machine Learning System

AMD EPYC and NAMD Powering the Future of HPC February, 2019

LinuxCon Japan 2014 OpenPOWER Technical Overview. Jeff Scheel Chief Engineer Linux on Power May 21, IBM Corporation

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

OpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,

Interconnect Your Future

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Revolutionizing Data-Centric Transformation

VOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017

Power Technology For a Smarter Future

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Sharing High-Performance Devices Across Multiple Virtual Machines

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

IBM Power 9 надежная платформа для развертывания облаков. Ташкент. Юрий Кондратенко Cross-Brand Sales Specialist

FUJITSU Server PRIMERGY CX400 M4 Workload-specific power in a modular form factor. 0 Copyright 2018 FUJITSU LIMITED

Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to OpenStack Environment

IBM Systems and Technology IBM Power Systems

Ampere emag Processor Optimized for the Cloud Kumar Sankaran Vice President, Software & Platforms, Ampere

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Gen-Z Memory-Driven Computing

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

Capturing value from an open ecosystem

Infrastructure Matters: POWER8 vs. Xeon x86

IBM Power Systems Update. David Spurway IBM Power Systems Product Manager STG, UK and Ireland

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

NVIDIA Think about Computing as Heterogeneous One Leo Liao, 1/29/2106, NTU

Cisco UCS C480 ML M5 Rack Server Performance Characterization

S8688 : INSIDE DGX-2. Glenn Dearth, Vyas Venkataraman Mar 28, 2018

IBM Emulex 16Gb Fibre Channel HBA Evaluation

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

OCTOPUS Performance Benchmark and Profiling. June 2015

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Arm Processor Technology Update and Roadmap

Huawei KunLun Mission Critical Server. KunLun 9008/9016/9032 Technical Specifications

OpenACC Course. Office Hour #2 Q&A

The Future of High Performance Interconnects

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Lenovo Enterprise Portfolio

PARTNERSHIPS AND ECOSYSTEMS

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

Open Innovation with Power8

Foundation Overview Mingzhi Christensen

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

The Why and How of HPC-Cloud Hybrids with OpenStack

QLogic/Lenovo 16Gb Gen 5 Fibre Channel for Database and Business Analytics

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Transcription:

OpenPOWER Performance Alex Mericas Chief Engineer, OpenPOWER Performance IBM

Delivering the Linux ecosystem for Power SOLUTIONS OpenPOWER IBM SOFTWARE LINUX ECOSYSTEM OPEN SOURCE Solutions with full stack innovation for Big Data and Analytics, Cloud and ISVs Google, nvidia, Tyan, Mellanox, Micron, Samsung, Canonical, POWERCORE WebSphere, DB2, Cognos, Watson, Tivoli, Rational, Platform Red Hat, SUSE and Ubuntu distributions Docker, OpenStack, KVM, OpenCompute, NoSQL Databases 30+ reference configurations for solutions 250+ members 200+ applications 2500+ Linux ISVs developing on Power 100,000+ open source packages 2

Faster memory access: S822LC delivers data from memory 2.2X faster than Intel Haswell when fully populated with DIMMs Based on STREAM Triad memory bandwidth when fully configured Deliver 2.2X more memory bandwidth with S822LC versus Intel Haswell (E5-2600 v3) STREAM Triad (GB/sec) 200 180 160 140 120 100 80 60 40 20 0 189 POWER8 IBM S822LC 20c/160t Intel Server System E5-2690 v3 24c/48t 85 x86 IBM Power System S822LC results are based on IBM internal measurements of STREAM Triad; 20 cores / 20 of 160 threads active, POWER8; 3.5GHz, up to 1TB memory, Intel Xeon data is based on published data running STREAM Triad; 24 cores / 24 of 48 threads active, E5-2390 v3; 2.3GHz up to 1.5 TB memory. For more details see http://www.intel.com/content/www/us/en/benchmarks/server/xeon-e5-2600-v3/xeon-e5-2600-v3-stream.html 3

Adding 2 NVIDIA Tesla K80 GPUs to IBM Power S822LC delivers up to 6.7X better performance on NAMD code Faster time to insight and reduced operating costs with fewer systems 8 7 Accelerate performance and reduce operating costs in biomolecular research Relative Performance 6 5 4 3 2 1 0 APOA1 F1ATPase STMV S822LC / 16c / 3.3 GHz S822LC / 16c / 3.3GHz / 2xK80 Results are based on IBM internal testing of systems running NAMD version 2.10 APOA1, F1ATPASE, STMV code benchmarked on POWER8 systems installed each with 2 NVIDIA Tesla K80 GPUs.. Individual results will vary depending on individual workloads, configurations and conditions. IBM Power System S822LC; 16 cores / 128 threads, POWER8; 3.3GHz, 128 GB memory IBM Power System S822LC; 16 cores / 128 threads, POWER8; 3.3GHz, 128 GB memory, 2 NVIDIA K80 GPUsC 4

IBM Power S822LC with NVIDIA Tesla K80s outperforms Xeon E5-2600 v3 with NVIDIA Tesla K80s for NAMD by up to 37% IBM Power S822LC delivers superior results for NAMD IBM Power S822LC is a superb platform for users of NAMD molecular dynamics package Relative Performance 1.5 1 0.5 0 GPU Accelerated NAMD Performance, IBM Power S822LC vs Haswell-EP 1.31 1.37 1.16 APOPA1 F1ATPASE STMV Xeon E5 v3 Host, 16-cores + 2x NVIDIA Tesla K80 IBM Power S822LC, 16-cores + 2x NVIDIA Tesla K80 Results are based on IBM & NVIDIA internal testing of systems running NAMD version 2.10 APOA1, F1ATPASE, STMV code; Compilation: CUDA 7.0.28, ICC 15.1.133, MKL 11.2.1 Individual results will vary depending on individual workloads, configurations and conditions. Supermicro 2028GR-TRT, 16 cores, x86, 2.3GHz, 128GB memory, 2 NVIDIA K80 GPUs IBM Power System S822LC, 16 cores / 128 threads, POWER8, 3.3GHz, 128GB memory, 2 NVIDIA K80 GPUs 5

With More: POWER8 with NVLink: 2.5x Faster CPU-GPU Connection HBM GPU HBM PCIe 32GB/s System bottleneck CPU DDR4 GPU NVLink 80 GB/s GPU POWER8 DDR4 GPUs Limited by PCIe Bandwidth From CPU-System Memory HBM NVLink Enables Fast Unified Memory Access between CPU & GPU Memories 6

Better Design: Flat and Fat System is engineered both flat and fat Data flows freely across system Nearly as broad from CPU: GPU as System Memory: CPU Big pipes between GPUs on the same socket DDR4 115GB/s CPU I B Fabric I B CPU 115GB/s DDR4 Addresses PCI-E Bottleneck for numerous usage models Burst at startup/teardown Stream data constantly Host-Device Constant Transfers between 2 GPUs Hidden Bus Transfers from Host- Device (due to insufficient BW) GPU NVLink GPU GPU NVLink GPU 80 GB/s 80 GB/s 7

POWER8 with NVLink Out-Acclerates Xeon E5-2600 V4 with PCIe Attached GPU IBM Power S822LC delivers 2.6X Queries per Hour POWER8 with NVLink has superb acceleration 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 KINETICA Queries per Hour (Filter=by-geographic area) Power S822LC for HPC Xeon E5-2640 V4 Power S822LC for HPC Xeon E5-2640 v4 Competitor 20-cores 20-cores (2) IBM POWER8 with NVLink, 2.86 Ghz, 20-cores, 160 threads (2) Xeon E5-2640 v4 @ 2.40GHz, 20-cores 1024 GB memory 512 GB memory (3) 3.84 TB 2.5" 6 Gbps SSD (2) 800 GB Intel SSD DC S3510 Series 2.5" 6 Gb SSD (4) NVIDIA Tesla P100 with NVLink (GPU) (4) NVIDIA Tesla K80 (GPU) NVLink PCIe Gen3 Ubuntu 16.04.1 LTS Ubuntu 16.04 LTS CUDA 8.0 CUDA 8.0 All results are based on running Kinetica Filter by geographic area queries on data set of 280 million simulated Tweets with 1 up to 80 simultaneous query streams each with 0 think time. 8

Resources and Support for Linux Developers IBM PartnerWorldTechnical Support IBM Innovation Centers Free access to Power Hardware Free porting assistance Free Eclipse-based development environment www.ibm.com/partnerworld/wps/servlet/contenthandler/pw_com_pwp_partnerworldprogram IBM Migration Factory Premier migration services for large applications http://www-03.ibm.com/systems/services/labservices/migrationfactory IBM Watson Developer s Cloud Access to IBM Watson for developing cognitive computing applications http://www.ibm.com/watson/developercloud/ IBM Power Development Cloud Provide free access to Power hardware to ISVs for Porting www.ibm.com/partnerworld/wps/servlet/contenthandler/stg_com_sys_powerdevelopment-platform IBM DeveloperWorks Technical resources, community, blogs, toolkits, How to articles, beta code www.ibm.com/developerworks/linux/ Regional Ecosystem Initiative Recruiting Key Solutions Greater China, North America, Europe Middleware and Industry Solutions IBM Innovation Centers All 50+ centers worldwide now support Linux on Power One-stop for ISVs, developers HW access, technical support, demos, toolkits, Hands-on labs www.ibm.com/systems/power/software/linux/centers Site Ox On-demand cloud-based development platform using Linux on POWER8 www.siteox.com 9

Performance resources for Linux on Power Advanced Toolchain Power Optimized GCC Power Optimized runtime libraries Power SDK Programming Framework Performance profiler Performance guidance IBM XL Compilers High Performance C/C++ and Fortran Compilers IBM Java High Performance Java 10

NVIDIA IBM Acceleration Lab Early Access to POWER8 with NVLink Technology Run on first & only systems with CPU-GPU NVLink Immediate performance gains from the wider bus and Tesla P100 Team up with IBM, NVIDIA on Advanced Acceleration Deep technical resources Custom plan to help migrate and optimize code together Unlock What was Previously Impossible Bring new applications with unified memory & easier data movement Apply for the program at: ibm.biz/accellab Email for more information: accellab@us.ibm.com

The Acceleration Lab Supports All Kinds of Clients and Goals Advanced Acceleration Linux on Power, and GPU accelerated Needs: Performance optimization for NVLink Result: Optimized Throughput Performance Going Parallel Linux on Power and not GPU accelerated Needs: GPU acceleration Result: Ready for Advanced Acceleration Getting to Power x86 Linux, already GPU accelerated Needs: Linux on Power port, benchmarking Result: Ready for Advanced Acceleration Starting From Scratch x86 Linux, no GPU acceleration Needs: Power LE Port OR GPU Acceleration Result: Ready for Going Parallel or Getting to Power IBM Systems