IBM CORAL HPC System Solution

Similar documents
19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

POWER9. Jeff Stuecheli POWER Systems, IBM Systems IBM Corporation

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Oak Ridge National Laboratory Computing and Computational Sciences

IBM Power Advanced Compute (AC) AC922 Server

Deep Learning mit PowerAI - Ein Überblick

Mapping MPI+X Applications to Multi-GPU Architectures

Power Systems AC922 Overview. Chris Mann IBM Distinguished Engineer Chief System Architect, Power HPC Systems December 11, 2017

IBM Power AC922 Server

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

Interconnect Your Future

IBM Power Systems HPC Cluster

OpenPOWER Innovations for HPC. IBM Research. IWOPH workshop, ISC, Germany June 21, Christoph Hagleitner,

IBM Deep Learning Solutions

IBM HPC Technology & Strategy

Present and Future Leadership Computers at OLCF

Building the Most Efficient Machine Learning System

Oak Ridge Leadership Computing Facility: Summit and Beyond

OpenPOWER Performance

Paving the Road to Exascale

Building the Most Efficient Machine Learning System

POWER9 Announcement. Martin Bušek IBM Server Solution Sales Specialist

UCX: An Open Source Framework for HPC Network APIs and Beyond

MICROWAY S NVIDIA TESLA V100 GPU SOLUTIONS GUIDE

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

outthink limits Spectrum Scale Enhancements for CORAL Sarp Oral, Oak Ridge National Laboratory Gautam Shah, IBM

Building NVLink for Developers

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

The Future of High Performance Interconnects

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

Modernizing OpenMP for an Accelerated World

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Interconnect Your Future

Interconnect Your Future

Emerging Technologies for HPC Storage

IBM Spectrum Scale IO performance

RECENT TRENDS IN GPU ARCHITECTURES. Perspectives of GPU computing in Science, 26 th Sept 2016

IBM Power User Group - Atlanta

n N c CIni.o ewsrg.au

Interconnect Your Future

OpenCAPI and its Roadmap

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

DGX UPDATE. Customer Presentation Deck May 8, 2017

Revolutionizing Data-Centric Transformation

Acceleration of HPC applications on hybrid CPU-GPU systems: When can Multi-Process Service (MPS) help?

IBM POWER9 Server Update

Gen-Z Memory-Driven Computing

NVIDIA GPU TECHNOLOGY UPDATE

GPUs and Emerging Architectures

Efficient Communication Library for Large-Scale Deep Learning

OpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Optimizing Efficiency of Deep Learning Workloads through GPU Virtualization

IBM SpectrumAI with NVIDIA Converged Infrastructure Solutions for AI workloads

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

April 4-7, 2016 Silicon Valley INSIDE PASCAL. Mark Harris, October 27,

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

World s most advanced data center accelerator for PCIe-based servers

InfiniBand Networked Flash Storage

DGX SYSTEMS: DEEP LEARNING FROM DESK TO DATA CENTER. Markus Weber and Haiduong Vo

Power Technology For a Smarter Future

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

IBM Research: AcceleratorTechnologies in HPC and Cognitive Computing

Overview of Tianhe-2

Open Innovation with Power8

System Design of Kepler Based HPC Solutions. Saeed Iqbal, Shawn Gao and Kevin Tubbs HPC Global Solutions Engineering.

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

John Unthank IBM Federal Sales IBM HPC Topics IBM Corporation

In-Network Computing. Sebastian Kalcher, Senior System Engineer HPC. May 2017

Inspur AI Computing Platform

TECHNOLOGIES CO., LTD.

Storage for HPC, HPDA and Machine Learning (ML)

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Stampede is Coming: A New Petascale Resource for the Open Science Community

ENDURING DIFFERENTIATION Timothy Lanfear

ENDURING DIFFERENTIATION. Timothy Lanfear

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

MACHINE LEARNING WITH NVIDIA AND IBM POWER AI

Intel Xeon Phi Coprocessors

MVAPICH User s Group Meeting August 20, 2015

Cray XC Scalability and the Aries Network Tony Ford

In-Network Computing. Paving the Road to Exascale. June 2017

FUJITSU Server PRIMERGY CX400 M4 Workload-specific power in a modular form factor. 0 Copyright 2018 FUJITSU LIMITED

Interconnect Your Future

Accelerator programming with OpenACC

(software agnostic) Computational Considerations

IBM Power 9 надежная платформа для развертывания облаков. Ташкент. Юрий Кондратенко Cross-Brand Sales Specialist

CUDA: NEW AND UPCOMING FEATURES

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Applying DDN to Machine Learning

GPUS FOR NGVLA. M Clark, April 2015

Deep Learning Performance and Cost Evaluation

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Transcription:

IBM CORAL HPC System Solution

HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy on Power HDP + Spectrum Scale HDP + Power Conductor with Spark Traditional High Performance Computing Large Scale High Performance (CORAL) Solution IBM Power HPC Stack Cluster Solution 2

Our Compute System direction is based on Heterogeneous Processing and Big Data Heterogeneous core types allow for optimizations in Power and chip area used Computation methods to address serial and parallel regions Power-efficient, frequency-optimized cores can be used for parallel regions while high-performance cores can be used for serial sections Different memory characteristics can be optimized for the computational approach High BW or Low Latency or Energy Efficient Design focus on common shared memory for programmability and data access Big Data drives the need for large memory and large BWs to Data for all compute units 3

IBM Power Systems: Open to the Core OpenPOWER Open Source Workloads >300 members OpenCAPI Open Frameworks Expanding the ecosystem 4

Leveraging OpenPOWER for CORAL Hardware DDR4 170 GB/s P9 CPU Up to 5.2 Tflops/GPU Stacked Memory for increased BW, capacity & energy efficiency Enhanced Unified Memory NVLink-1 5x faster than PCIe Gen 3 Up to 7.8 TF/GPU Next Generation High Bandwidth Memory Memory coherency NVLink-2 7-10x faster than PCIe Gen 3 GPU can access CPU s page tables Tesla V100 100 GB/s NVLink Tesla NVL V100 100 GB/s Tesla NVL V100 100 GB/s Up to 12 SMT8 cores CAPI Acceleration Adaptive Power Management Up to 24 SMT4 cores CAPI v2, PCIe Gen 4 Superior Core Performance 100 GB/s IBM Power Systems POWER9 server for HPC & AI 100 Gb/s EDR In-Network Computing with SHARP Adaptive Routing Billion Cell Reservoir Simulation in record time (92 mins vs 20 hours) ResNet-50 90-epoch training in lowest time (7 hours vs 10 days) with highest accuracy (33.8% vs 29.8%) IBM Power Systems S822LC (Minsky) Native RDMA NVMe over Fabrics offload PCIe Gen 4 CAPI v2 for fast virtual RDMA support Hardware Tag Matching (automate pt-2-pt communication) MPI rendezvous protocol offload Precision time protocol support IBM Power Systems AC922 (Newell) 5

Software Strategy Builds on CORAL Collaboration, Open Source, and OpenPOWER Open Source Supports multiple HW and Installation options Asynchronous Checkpoint transfer from on node NVMe to file system Stage in data to expedite job start Stage out data releases CPU to start new job Flash wear, health and usage monitoring Open Source in 2018 Burst Buffer New System Productivity Feature for CORAL xcat Hardware discovery & provisioning CSM New System Management SW Developed for CORAL Provide a complete HPC Software Solution for Power Systems Fully Integrated, Tested and Supported Exploit Power hardware features Focus on maintainability and reliability Integrates system management functions Integrates job management Jitter aware design Data collection aligned to system clock Big Data for RAS analysis Open Source in 2018 Single Namespace up to 250 Petabytes 2.5 TB/s large block sequential IO performance 2.6M file creates/sec for 32KB files in unique directories 50K file creates/sec to single shared directory Tuned for each Power Architecture Serial, SMP (OpenMP), GPU (CUDA), SIMD (VSX) and SPMD (MPI) subroutines Callable from Fortran, C, and C++Parallel ESSL SMP Libraries Spectrum Scale Integrated File System ESSL/PESSL Engineering & Scientific Subroutine Libraries Compilers CUDA OpenMP 4.x OpenACC Spectrum LSF Job and Workflow Management Spectrum MPI & PPT Optimized MPI Solution Superior job throughput at scale Enhanced job feedback for users, administrators RESTful API s to simplify integration into business processes Absolute priority scheduling Enhancements for advance reservations Pending job limits Based on Open MPI New Features driven by CORAL Innovation including enablement of unique hardware features High Performance Collective Library Enhanced multithreaded performance IBM Power Parallel Performance Toolkit for analyzing application performance including single timeline tracking for CPU, GPU & MPI Rich collection of compiler supporting multiple programing models Optimized for Power Open Source and fully supported proprietary compiler options 6

CORAL HPC System Overview Single Namespace up to 250 Petabytes 2.5 TB/s large block sequential IO performance 2.6M file creates/sec for 32KB files in unique directories 50K file creates/sec to single shared directory Spectrum Scale RAID with declustered erasure coding Over 850 Tflops peak performance per rack Multiple programing models & supported development environment Open Source Friendly Power Efficient Minimizes Jitter Impacts Air and Water cooled versions Up to 6 GPUs/server Up to 44 POWER 9 cores Compute Rack Login/Launch Servers IBM HPC Software Solution Parallel Performance Toolkit for Power Spectrum MPI ESSL / PESSL Spectrum Scale Platform LSF Family xcat XL, PGI & LLVM Compilers NVIDIA CUDA CSM Burst Buffer Asynchronous Checkpoint File Transfer IBM Spectrum Scale Burst Buffer New CORAL System Feature Collaboration between IBM and CORAL Labs Provides the Integration Framework for Next level System Management Open Source in 2018 IBM Power Systems POWER9 server for HPC & AI with NVLink Breakthrough performance for GPU accelerated applications High Speed InfiniBand Network Gateway Server Workload Manager Server Management Server Service Server Leverage Big Data capabilities to solve tough system management issues Big Data System CSM IBM Spectrum LSF 7 xcat (Open Source) IBM Log Analytics

CORAL IBM Delivers Summit and Sierra AI Exascale Today Order of Magnitude Leap in Computational Power Real, Accelerated Science ACME 200 PF DIRAC FLASH GTC HACC LSDALTON NAMD 20 PF NUCCOR NWCHEM QMCPACK Deployments beginning with full acceptance in 2018 3+EFLOPS Tensor Ops 10X Perf Over Titan RAPTOR SPECFEM XGC 5-10X Application Perf Over Titan Significant application performance over Titan (AMD/NVIDIA) Achieved with ¼ the servers 8

9

CORAL ORNL (Summit) 200PF System POWER9: 22 Cores 4 Threads/core 0.54 DP TF/s 3.07 GHz POWER9 2 Socket Server 2 P9 + 6 Volta GPU 512 GiB SMP Memory (32 GiB DDR4 RDIMMs) 96 GiB GPU Memory (HBM stacks) 1.6 TB NVMe Standard 2U 19in. Rack mount Chassis Volta: 7.0 DP TF/s 16GB @ 0.9 TB/s SXM2 Mellanox IB4X EDR Switch IB-2 Mellanox IB4X EDR 648p Directors Full bisection System ESS Building Block Compute Rack 18 nodes 775 TF/s 10.7 TiB 59 KW max SSC (4 ESS GL4): 8 servers, 16 JBOD 16.8 PB (gross) 38 KW max Floor plan rack concept Compute (256) Switch (18) Storage (40) Infrastructure (4) 10

CORAL: ORNL Summit Overhead H 2 O distribution 11

12

CORAL LLNL (Sierra) 125PF System POWER9: 22 Cores 4 Threads/core 0.54 DP TF/s 3.07 GHz POWER9 2 Socket Server 2 P9 + 4 Volta GPU 256 GiB SMP Memory (16 GiB DDR4 RDIMMs) 64 GiB GPU Memory (HBM stacks) 1.6 TB NVMe Standard 2U 19in. Rack mount Chassis Volta: 7.0 DP TF/s 16GB @ 0.9 TB/s SXM2 Mellanox IB4X EDR Switch IB-2 Mellanox IB4X EDR 648p Directors Half Bisection System ESS Building Block Compute Rack 18 nodes 523 TF/s 5.6 TiB 45 KW max SSC (4 ESS GL4): 8 servers, 16 JBOD 16.8 PB (gross) 38 KW max Floor plan rack concept Compute (240) Switch (9) Storage (24) Infrastructure (4) 13

CORAL: LLNL Sierra 14

Thank you! ibm.com/systems/hpc 15