Blue Waters System Overview

Similar documents
Blue Waters System Overview. Greg Bauer

Blue Waters Super System

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

User Training Cray XC40 IITM, Pune

Is Petascale Complete? What Do We Do Now?

Adventures in Parallelization On One of the Largest Supercomputers in The Country. Nathan Sloat

Lecture 20: Distributed Memory Parallelism. William Gropp

The Cray Programming Environment. An Introduction

Preparing GPU-Accelerated Applications for the Summit Supercomputer

The Arm Technology Ecosystem: Current Products and Future Outlook

8/19/13. Blue Waters User Monthly Teleconference

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Illinois Proposal Considerations Greg Bauer

Cori (2016) and Beyond Ensuring NERSC Users Stay Productive

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Programming Environment 4/11/2015

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

The Cray Programming Environment. An Introduction

Wednesday : Basic Overview. Thursday : Optimization

Telecommunications and Internet Access By Schools & School Districts

Present and Future Leadership Computers at OLCF

Global Forum 2007 Venice

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

High Performance Computing and Data Resources at SDSC

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Introducing the next generation of affordable and productive massively parallel processing (MPP) computing the Cray XE6m supercomputer.

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

HPC Technology Update Challenges or Chances?

Blue Waters I/O Performance

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

SuperMike-II Launch Workshop. System Overview and Allocations

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

2018 NSP Student Leader Contact Form

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Bridging the Gap Between High Quality and High Performance for HPC Visualization

Our Workshop Environment

NSA s Centers of Academic Excellence in Cyber Security

HPC Architectures. Types of resource currently in use

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

HPC Hardware Overview

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Cray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET

CostQuest Associates, Inc.

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

OP2 FOR MANY-CORE ARCHITECTURES

Basic Specification of Oakforest-PACS

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

An Introduction to OpenACC

The Lincoln National Life Insurance Company Universal Life Portfolio

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Experiences with ENZO on the Intel Many Integrated Core Architecture

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

B.2 Measures of Central Tendency and Dispersion

ABySS Performance Benchmark and Profiling. May 2010

GPUs and Emerging Architectures

The Red Storm System: Architecture, System Update and Performance Analysis

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

High Performance Computing with Accelerators

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Ocean Express Procedure: Quote and Bind Renewal Cargo

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Tina Ladabouche. GenCyber Program Manager

Trends in HPC (hardware complexity and software challenges)

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Interconnect Your Future

CS500 SMARTER CLUSTER SUPERCOMPUTERS

The Mont-Blanc approach towards Exascale

Comet Virtualization Code & Design Sprint

Cray Scientific Libraries: Overview and Performance. Cray XE6 Performance Workshop University of Reading Nov 2012

Scalable Computing at Work

Breakthrough Science via Extreme Scalability. Greg Clifford Segment Manager, Cray Inc.

Transport Simulations beyond Petascale. Jing Fu (ANL)

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

Smarter Clusters from the Supercomputer Experts

Cray Operating System Plans and Status. Charlie Carroll May 2012

Designed for Maximum Accelerator Performance

NAMD Performance Benchmark and Profiling. January 2015

Pedraforca: a First ARM + GPU Cluster for HPC

Lessons learned from Lustre file system operation

The NCAR Yellowstone Data Centric Computing Environment. Rory Kelly ScicomP Workshop May 2013

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Mont-Blanc Project

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Storage Supporting DOE Science

Transcription:

Blue Waters System Overview

Blue Waters Computing System Aggregate Memory 1.5 PB Scuba Subsystem Storage Configuration for User Best Access 120+ Gb/sec 100-300 Gbps WAN 10/40/100 Gb Ethernet Switch IB Switch External Servers >1 TB/sec 100 GB/sec Spectra Logic: 300 usable PB Sonexion: 26 usable PB 2

National Petascale Computing Facility Modern Data Center 90,000+ ft 2 total 30,000 ft 2 6 foot raised floor 20,000 ft 2 machine room gallery with no obstructions or structural support elements Energy Efficiency LEED certified Gold Power Utilization Efficiency, PUE = 1.1 1.2 24 MW current capacity expandable Highly instrumented 3

The Movement of Data I2 NPCF Blue Waters UI ICCN DWDM ICCN L3 MREN GLCPC/ Science teams Gb/s 10 20 40 100 XSEDE NLR framenet XSEDE RP sites NLR OmniPop ESnet 4

Gemini Fabric (HSN) DSL 48 Nodes MOM 64 Nodes BOOT Nodes SDB Nodes XE6 Compute Nodes 5,688 Blades 22,640 Nodes 362,240 Cores 724,480 Integer Cores 4 GB per FP core RSIP Nodes Network GW Nodes Unassigned Nodes Cray XE6/XK7-276 Cabinets XK7 GPU Nodes 1057 Blades 4,228 Nodes 33,824 Cores 4,228 GPUs LNET Routers 582 Nodes Boot RAID SMW Boot Cabinet eslogin 4 Nodes Import/Export 28 Nodes InfiniBand fabric 10/40/100 Gb Ethernet Switch Cyber Protec]on IDPS HPSS Data Mover 50 Nodes Management Node Sonexion 25+PB online storage 144+144+1440 OSTs NCSAnet esservers Cabinets Near- Line Storage 300+PB NPCF 5

QDR IB 1200 GB/s Online disk >25PB /home, /project /scratch 300 GB/s LNET(s) Protocols LNET Core IB Switches TCP/IP (10 GbE)) GridFTP (TCP/IP)) SCSI (FCP) Cray Gemini High Speed Network Network GW 100 GB/s 4 Dell eslogin 28 Dell 720 IE servers 100 GB/s 10 GbE 40GbE Switch 440 Gb/s Ethernet from site network LAN/WAN rsip GW 55 GB/s 100 GB/s 100 GB/s 1.2PB Disk IB 40 GbE 50 Dell 720 Near Line servers 300+PB Tape FC8 100 GB/s All storage sizes given as the amount usable 6

Blue Waters Nearline/Archive System Spectra Logic T-Finity Dual-arm robotic tape libraries High availability and reliability, with built-in redundancy Blue Waters Archive Capacity: 380 PBs (raw), 300 PBs (usable) Bandwidth: 100 GB/sec (sustained) Redundant arrays of independent tapes RAIT for increased reliability. Largest HPSS open production system. 7

Online Storage home : 144 OSTs : 2.2 PB useable : 1 TB quota projects: 144 OSTs : 2.2 PB useable : 5 TB group quota scratch: 1440 OSTs : 22 PB useable : 500 TB group quota Cray Sonexion with Lustre for all file-systems. All visible from compute nodes. Scratch has 30 day purge policy in effect for both files and directories. Not backed up. ONLY home and project file-systems are backed up. 8

Nearline Storage (HPSS) home: 5 TB quota projects: 50 TB group quota IBM HPSS + DDN + Spectra Logic. Accessed via GO or globus-url-copy. 9

GO with Globus Online GridFTP client development for IE and HPSS nodes. Enabled data striping with GridFTP. Managed file transfers. Command line interface. Globus connect for sites without GridFTP endpoints. 10

Blue Waters High Speed Network InfiniBand Network(s) Login Server(s) Y GigE Fibre Channel SMW X Z Interconnect Network Infiniband Boot Raid Lustre Blue Waters 3D Torus Size 24 x 24 x 24 Compute Nodes Cray XE6 Compute Cray XK7 Accelerator Operating System Boot System Database Service Nodes Lustre File System Login/Network LNET Routers Login Gateways Network 11

HSN View Gemini-node distinction Red XK Orange LNET Yellow MOM Gray Service Blue XE Complicated Topology 12

Blue Waters XE6 Node Blue Waters contains 22,640 XE6 compute nodes Number of Core Modules* Node Characteris]cs Peak Performance 16 313 Gflops/sec HT3 HT3 Memory Size 64 GB per node Memory Bandwidth (Peak) Interconnect Injec]on Bandwidth (Peak) 102 GB/sec 9.6 GB/sec per direc]on *Each core module includes 1 256-bit wide FP unit and 2 integer units. This is often advertised as 2 cores, leading to a 32 core node. Z X Y 13

XE Node NUMA and core complexity 2 sockets per XE node. 2 NUMA domains per socket. 4 Bulldozer FP units per NUMA domain. 2 integer units per FP unit. Image courtesy of Georg Hager http://blogs.fau.de/ 14

CPU Node Comparison Node Processor type Nominal Clock Freq. (GHz) FPU cores Peak GF/s Peak GB/s Blue Waters Cray XE AMD 6276 Interlagos 2.45 16* 313 102 NICS Kraken Cray XT AMD Istanbul 2.6 12 125 25.6 NERSC Hopper XE AMD 6172 MagnyCours 2.1 24 202 85.3 ANL IBM BG/P POWERPC 450 0.85 4 13.6 13.6 ANL IBM BG/Q IBM A2 1.6 16* 205 42.6 NCAR Yellowstone Intel E5-2670 Sandy Bridge 2.6 16* 333 102 NICS Darter Cray XC30 Intel E5-2600 Sandy Bridge 2.6 16* 333 102 An * indicates processors with 8 flops per clock period. 15

Cray XK7 Blue Waters contains 4,228 NVIDIA K20x (GK110) GPUs XK7 Compute Node Characteris]cs PCIe Gen2 Host Processor AMD Series 6200 (Interlagos) Host Processor Performance 156.8 Gflops PCIe Gen2 HT3 HT3 K20x Peak (DP floa]ng point) 1.32 Tflops Host Memory K20x Memory 32GB 51 GB/sec 6GB GDDR5 capacity 235GB/sec ECC Z Y X 16

NVIDIA K20x Compute complexity. Additional memory hierarchy and types. 14 Mul]processors, 192 CUDA Cores/MP: 2688 CUDA Cores GPU Clock rate: 732 MHz (0.73 GHz) Memory Clock rate: 2600 Mhz Memory Bus Width: 384- bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per mul]processor: 2048 Maximum number of threads per block: 1024 17

XK Features Hardware accelerated OpenGL with an X11 server. Not standard support by vendor. GPU operation mode flipped to allow display functionality (was compute only). X server enabled/disabled at job start/end when specified by user. Several teams use XK nodes for visualization to avoid transferring large amounts of data, shortening workflow. 18

Blue Waters Software Environment Languages Compilers Programming Models IO Libraries Tools Optimized Scientific Libraries Fortran C C++ Python UPC Performance Analysis Cray Compiling Environment (CCE) GNU Debuggers Distributed Memory (Cray MPT) MPI SHMEM Shared Memory OpenMP 3.0 NetCDF HDF5 ADIOS Resource Manager Environment setup Modules Debugging Support Tools Fast Track Debugger (CCE w/ DDT) Abnormal Termination Processing LAPACK ScaLAPACK BLAS (libgoto) Iterative Refinement Toolkit Cray Performance Monitoring and Analysis Tool PAPI PerfSuite Tau Allinea DDT lgdb Prog. Env. Eclipse Traditional PGAS & Global View UPC (CCE) CAF (CCE) Visualization VisIt Paraview YT STAT Cray Comparative Debugger # Data Transfer GO HPSS Cray Adaptive FFTs (CRAFFT) FFTW Cray PETSc (with CASK) Cray Trilinos (with CASK) Cray Linux Environment (CLE)/SUSE Linux Cray developed Under development Licensed ISV SW 3 rd party packaging NCSA supported Cray added value to 3 rd party 19

Reliability We provide to the user a checkpoint interval calculator based on the work of J. Daly, using recent node and system interrupt data. User inputs number of XE and/or XK nodes, and the time to write a checkpoint file. September data 22,640 XE nodes MTTI ~ 14 hrs. 4,228 XK nodes MTTI ~ 32 hrs. System interrupts MTTI ~ 100 hrs. Checkpoint intervals on the order of 4 6 hrs. at full system (depending on time to write checkpoint). 20

Access to Blue Waters National Science Foundation - At least 80 percent of the capacity of Blue Waters--about 150 million node-hours each year. University of Illinois at Urbana-Champaign - Up to 7 percent of the computing capacity of Blue Waters about 13 million node-hours each year is reserved for faculty and staff at the University of Illinois at Urbana-Champaign. Great Lakes Consortium for Petascale Computation - Up to 2 percent is available to researchers whose institutions are members of the Great Lakes Consortium for Petascale Computation. Education - Up to 1 percent of the Blue Waters compute capacity or 1.8 million node-hours per year is available for educators and students. Industry - NCSA's industry partners and industry partners of any of the Great Lakes Consortium for Petascale Computation institutions also have opportunities to use Blue Waters. Innovation and Exploration - Up to 5 percent 21

Who Runs on Blue Waters AL 15 AZ 3 CA 108 CO 14 CT 2 FL 20 GA 8 IA 8 IL 377 IN 52 KY 3 LA 8 MA 1 MD 10 MI 17 MN 21 MO 3 MS 2 NC 23 NH 1 NJ 23 NM 6 NV 3 NY 23 OH 25 OK 1 OR 4 PA 23 PR 3 RI 4 SC 2 SD 1 TN 20 TX 8 UT 17 VA 26 WA 5 WV 1 WY 5 AU 1 CA 3 CY 1 CZ 2 DE 3 DK 4 ES 1 GB 2 IT 2 MY 1 NO 3 PT 1 22

23

Summary Outstanding Computing System The largest installation of Cray s most advanced technology Extreme-scale Lustre file system with advances in reliability/ maintainability Extreme-scale archive with advanced RAIT capability Most balanced system in the open community Blue Waters is capable of addressing science problems that are memory, storage, compute, or network intensive or any combination. Use of innovative technologies provides a path to future systems Illinois/NCSA is a leader in developing and deploying these technologies as well as contributing to community efforts. 24