Illinois Proposal Considerations Greg Bauer

Similar documents
Preparing GPU-Accelerated Applications for the Summit Supercomputer

HOKUSAI System. Figure 0-1 System diagram

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

GPUs and Emerging Architectures

OPEN MPI WITH RDMA SUPPORT AND CUDA. Rolf vandevaart, NVIDIA

HPC Architectures. Types of resource currently in use

An Introduction to OpenACC

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

MILC Performance Benchmark and Profiling. April 2013

The challenges of new, efficient computer architectures, and how they can be met with a scalable software development strategy.! Thomas C.

Compute Node Linux (CNL) The Evolution of a Compute OS

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

THE FUTURE OF GPU DATA MANAGEMENT. Michael Wolfe, May 9, 2017

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

NAMD Performance Benchmark and Profiling. January 2015

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

Genius Quick Start Guide

Industrial achievements on Blue Waters using CPUs and GPUs

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

OpenACC/CUDA/OpenMP... 1 Languages and Libraries... 3 Multi-GPU support... 4 How OpenACC Works... 4

AN INTRODUCTION TO CLUSTER COMPUTING

Comet Virtualization Code & Design Sprint

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

OpenFOAM Performance Testing and Profiling. October 2017

Quotations invited. 2. The supplied hardware should have 5 years comprehensive onsite warranty (24 x 7 call logging) from OEM directly.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Cuda C Programming Guide Appendix C Table C-

CUDA Experiences: Over-Optimization and Future HPC

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

The Eclipse Parallel Tools Platform

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

(Reaccredited with A Grade by the NAAC) RE-TENDER NOTICE. Advt. No. PU/R/RUSA Fund/Equipment Purchase-1/ Date:

LBRN - HPC systems : CCT, LSU

IBM Power Systems HPC Cluster

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Our Workshop Environment

The Stampede is Coming: A New Petascale Resource for the Open Science Community

LS-DYNA Performance Benchmark and Profiling. October 2017

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Pedraforca: a First ARM + GPU Cluster for HPC

OP2 FOR MANY-CORE ARCHITECTURES

An Introduction to the SPEC High Performance Group and their Benchmark Suites

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 28, 2011 David L. Hart, CISL

LS-DYNA Performance Benchmark and Profiling. October 2017

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

CPMD Performance Benchmark and Profiling. February 2014

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications

Parallel Visualization At TACC. Greg Abram

SuperMike-II Launch Workshop. System Overview and Allocations

Blue Waters I/O Performance

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

HPC Middle East. KFUPM HPC Workshop April Mohamed Mekias HPC Solutions Consultant. Agenda

INTRODUCTION TO THE CLUSTER

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Our Workshop Environment

Initial Performance Evaluation of the Cray SeaStar Interconnect

Trends in HPC (hardware complexity and software challenges)

UCX: An Open Source Framework for HPC Network APIs and Beyond

Our Workshop Environment

AMBER 11 Performance Benchmark and Profiling. July 2011

Cray Operating System Plans and Status. Charlie Carroll May 2012

The Arm Technology Ecosystem: Current Products and Future Outlook

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Graham vs legacy systems

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Introduction to High Performance Computing at ZIH

n N c CIni.o ewsrg.au

Parallel Programming on Ranger and Stampede

SNAP Performance Benchmark and Profiling. April 2014

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

ABySS Performance Benchmark and Profiling. May 2010

University at Buffalo Center for Computational Research

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

The Red Storm System: Architecture, System Update and Performance Analysis

The Titan Tools Experience

Regional & National HPC resources available to UCSB

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Practical: a sample code

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

LS-DYNA Performance Benchmark and Profiling. April 2015

Preparing for Highly Parallel, Heterogeneous Coprocessing

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

Lattice Simulations using OpenACC compilers. Pushan Majumdar (Indian Association for the Cultivation of Science, Kolkata)

DATARMOR: Comment s'y préparer? Tina Odaka

User Training Cray XC40 IITM, Pune

Blue Waters System Overview. Greg Bauer

Technology for a better society. hetcomp.com

Transcription:

- 2016 Greg Bauer

Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and software requests are handled through the Jira ticket system. Advanced Application Support for projects on Blue Waters can be requested via the ticket system. The requests are reviewed and evaluated by the project office for breadth, reach and impact for the project and the community at large. Major Science teams (such as NSF PRAC awards) are assigned a Point of Contact (POC) within the Science and Engineering Application Support (SEAS) group. The POC works with the science team on advanced application support areas such as tuning, modeling, IO and optimizing application codes as well as standard service requests. In some cases POCs participate in code restructuring, re-engineering or redesign such as implementing GPU functionality via OpenACC or alternatives to MPI collective operations. Such work is tracked via a coordinated work plan that is developed between the POC and the science team to clearly indicate the scope and scale of the work involved including milestones and deliverables. Work plans are reviewed and approved or rejected by the Blue Waters project office. Support for workflows, data movement and visualization are provided by the representative groups within Blue Waters. 2

What makes Blue Waters a Supercomputer? Isn t it a really large Linux Cluster? Yes and no. It is running some commodity hardware (processor, memory, GPU) but it has a proprietary interconnect, a tuned Linux OS and a very large and fast parallel file-system. The scale of the system is important. My [desktop, local cluster, ] runs my application more quickly than Blue Waters. Why? Consider the following slides 3

Not All Compute Nodes are Equal https://bluewaters.ncsa.illinois.edu/node_core_comparison 4

Local Resource Comparison Property Campus Cluster (Golub) Blue Waters # of compute nodes 512 22,640 XE + 4,224 XK Processors (2) Intel E5-2670 or (2) E5-2670V2, 2.5 2.6 GHz, 20 25 MB L3 Cache (2 or 1) AMD 6276 2.45GHz, 16MB L3 Cache Memory per Node 32 GB 256 GB 64 GB or 32 GB GPUs NVIDIA M2090, K40 NVIDIA K20x Interconnect NIC Mellanox FDR IB 7 GB/s Cray Gemini 9.6 GB/s File system GPFS Lustre ~ 1 TB/s write Swap Local disk No local disk 5

Campus Cluster Single Node Performance The compute nodes on Campus Cluster are Intel based: https://campuscluster.illinois.edu/hardware/ (2) Intel E5-2670 (Sandy Bridge) 2.60GHz, 20M Cache, 8C This node has peaks of 333 GF/node and 102 GB/s per node. Performance can be better than XE node. (2) Intel E5-2670V2 (Ivy Bridge) 2.50GHz, 25M Cache, 10C This node has a peak of 400 GF/node (2 CPUs x 2.5 GHz x 10 cores/cpu x 8 Flops/CPU/s) and a peak of 120 GB/s per node. Performance can be better than XE node. 6

One caveat Intel CPU (SB) L1/L2 caches are faster than AMD CPU (BD) L1/L2 caches. Small N fits in L1 and L2 caches. Note the difference in performance rate between SB and BD. As N increases, data is in L3 or RAM and the rates become equal. 7

Things to include in a proposal Scaling data is useful if you have it. Strong scaling Weak scaling 8

Things to consider in a proposal IO Storage requirements Typical file sizes Number of files Residency File format(s) Application Checkpoint strategy 9

Runtime and Programming Environment No local disk, no swap. No ssh access to compute nodes (see CCM). Kernel and stack SLES 11 SP3 based. CUDA 7.5 (maybe 8 at some point) No VM or OS ISO boot support. (Shifter) Support for Linux cluster compatibility (CCM). Current compilers (Cray, PGI, GNU, Intel), Math libraries, MPI-3, Python, R. No MATLAB. 10