NCAR s Data-Centric Supercomputing Environment Yellowstone. November 29, 2011 David L. Hart, CISL

Similar documents
NCAR s Data-Centric Supercomputing Environment Yellowstone. November 28, 2011 David L. Hart, CISL

Yellowstone allocations and writing successful requests. November 27, 2012 David L. Hart, CISL

The NCAR Yellowstone Data Centric Computing Environment. Rory Kelly ScicomP Workshop May 2013

Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment

NCAR Workload Analysis on Yellowstone. March 2015 V5.0

CISL Update Operations and Services

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

CISL Update. 29 April Operations and Services Division

NCAR Workload Analysis on Yellowstone. September 2014 V4.1

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

NCEP HPC Transition. 15 th ECMWF Workshop on the Use of HPC in Meteorology. Allan Darling. Deputy Director, NCEP Central Operations

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

University at Buffalo Center for Computational Research

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Outline. March 5, 2012 CIRMMT - McGill University 2

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

SuperMike-II Launch Workshop. System Overview and Allocations

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

UAntwerpen, 24 June 2016

Comet Virtualization Code & Design Sprint

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

Organizational Update: December 2015

Fast computers, big/fast storage, fast networks Marla Meehl

HPC Capabilities at Research Intensive Universities

The Architecture and the Application Performance of the Earth Simulator

CISL Update Operations and Yellowstone. CISL HPC Advisory Panel Meeting 18 October 2012

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Introduction to ARSC. David Newman (from Tom Logan slides), September Monday, September 14, 15

LBRN - HPC systems : CCT, LSU

IBM Power Systems HPC Cluster

Illinois Proposal Considerations Greg Bauer

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

The Stampede is Coming: A New Petascale Resource for the Open Science Community

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Advanced Research Compu2ng Informa2on Technology Virginia Tech

Altix Usage and Application Programming

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

Flux: The State of the Cluster

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

CISL Operations and Yellowstone Update

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

BRC HPC Services/Savio

Storage Supporting DOE Science

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

NAMD Performance Benchmark and Profiling. January 2015

SPARC 2 Consultations January-February 2016

Preparing GPU-Accelerated Applications for the Summit Supercomputer

HPC Hardware Overview

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

The BioHPC Nucleus Cluster & Future Developments

NCAR Computation and Information Systems Laboratory (CISL) Facilities and Support Overview

Supercomputing at the United States National Weather Service (NWS)

A More Realistic Way of Stressing the End-to-end I/O System

Interconnect Your Future

Overview of HPC at LONI

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

The Red Storm System: Architecture, System Update and Performance Analysis

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

HOKUSAI System. Figure 0-1 System diagram

CSCS Proposal writing webinar Technical review. 12th April 2015 CSCS

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

HPC Architectures. Types of resource currently in use

The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed. November 2008

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

Data storage services at KEK/CRC -- status and plan

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

LRZ SuperMUC One year of Operation

Cluster Network Products

Leibniz Supercomputer Centre. Movie on YouTube

Basic Specification of Oakforest-PACS

Information Technology Services. Informational Report for the Board of Trustees October 11, 2017 Prepared effective August 31, 2017

Regional & National HPC resources available to UCSB

NetApp High-Performance Storage Solution for Lustre

Brutus. Above and beyond Hreidar and Gonzales

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Transitioning NCAR MSS to HPSS

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

High Performance Computing and Data Resources at SDSC

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

Stan Posey, NVIDIA, Santa Clara, CA, USA

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

NERSC. National Energy Research Scientific Computing Center

Intel Many Integrated Core (MIC) Architecture

The Spider Center-Wide File System

IBM CORAL HPC System Solution

Transcription:

NCAR s Data-Centric Supercomputing Environment Yellowstone November 29, 2011 David L. Hart, CISL dhart@ucar.edu

Welcome to the Petascale Yellowstone hardware and software Deployment schedule Allocations opportunities at NWSC ASD, University, CSL, NCAR, and Wyoming-NCAR alliance 2

NCAR Resources at the NCAR-Wyoming Supercomputing Center (NWSC) Centralized Filesystems and Data Storage 10.9 PB initially 16.4 PB in 1Q2014 12x usable capacity of current GLADE High-Performance Computing IBM idataplex cluster 1.55 PFLOPs 30x Bluefire computing performance Data Analysis and Visualization Large-memory system GPU computation and visualization system Knights Corner system NWSC-1 Procurement NCAR HPSS Data Archive 2 StorageTek SL8500 tape libraries 20k cartridge slots >100 PB capacity with 5 TB cartridges (uncompressed) AMSTAR Procurement

Yellowstone Environment Geyser & Caldera DAV clusters Yellowstone HPC resource, 1.55 PFLOPS peak GLADE Central disk resource 11 PB (2012), 16.4 PB (2014) High Bandwidth Low Latency HPC and I/O Networks FDR InfiniBand and 10Gb Ethernet 1Gb/10Gb Ethernet (40Gb+ future) NCAR HPSS Archive 100 PB capacity ~15 PB/yr growth Science Gateways RDA, ESG Data Transfer Services Remote Vis Partner Sites XSEDE Sites

Yellowstone NWSC High-Performance Computing Resource Batch Computation 74,592 cores total 1.552 PFLOPs peak 4,662 IBM dx360 M4 nodes 16 cores, 32 GB memory per node Intel Sandy Bridge EP processors with AVX 2.6 GHz clock 149.2 TB total DDR3-1600 memory 29.8 Bluefire equivalents High-Performance Interconnect Mellanox FDR InfiniBand full fat-tree 13.6 GB/s bidirectional bw/node <2.5 µs latency (worst case) 31.7 TB/s bisection bandwidth Login/Interactive 6 IBM x3650 M4 Nodes; Intel Sandy Bridge EP processors with AVX 16 cores & 128 GB memory per node

NCAR HPC Profile Peak PFLOPs at NCAR 1.5 30x Bluefire performance yellowstone IBM idataplex/fdr- IB (yellowstone) Cray XT5m (lynx) 1.0 IBM Power 575/32 (128) POWER6/DDR-IB (bluefire) IBM p575/16 (112) POWER5+/HPS (blueice) IBM p575/8 (78) POWER5/HPS (bluevista) 0.5 IBM BlueGene/L (frost) bluefire lynx bluesky frost bluevista blueice frost upgrade 0.0 Jan-04 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 IBM POWER4/Colony (bluesky)

GLADE 10.94 PB usable capacity 16.42 PB usable (1Q2014) Estimated initial file system sizes collections 2 PB RDA, CMIP5 data scratch 5 PB shared, temporary space projects 3 PB long-term, allocated space users 1 PB medium-term work space Disk Storage Subsystem 76 IBM DCS3700 controllers & expansion drawers 90 2-TB NL-SAS drives/controller add 30 3-TB NL-SAS drives/controller (1Q2014) GPFS NSD Servers 91.8 GB/s aggregate I/O bandwidth; 19 IBM x3650 M4 nodes I/O Aggregator Servers (GPFS, GLADE-HPSS connectivity) 10-GbE & FDR interfaces; 4 IBM x3650 M4 nodes High-performance I/O interconnect to HPC & DAV Mellanox FDR InfiniBand full fat-tree 13.6 GB/s bidirectional bandwidth/node

NCAR Disk Capacity Profile Total Centralized Filesystem Storage (PB) NWSC-1 GLADE bluefire 18 16 14 Total Usable Storage (PB) 12 10 8 6 4 NWSC-1 2 GLADE 0 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16

Geyser and Caldera NWSC Data Analysis & Visualization Resource Geyser: Large-memory system 16 IBM x3850 nodes Intel Westmere-EX processors 40 cores, 1 TB memory, 1 NVIDIA Kepler Q13H-3 GPU per node Mellanox FDR full fat-tree interconnect Caldera: GPU computation/visualization system 16 IBM x360 M4 nodes Intel Sandy Bridge EP/AVX 16 cores, 64 GB memory per node 2 NVIDIA Kepler Q13H-3 GPUs per node Mellanox FDR full fat-tree interconnect Knights Corner system (November 2012 delivery) Intel Many Integrated Core (MIC) architecture 16 IBM Knights Corner nodes 16 Sandy Bridge EP/AVX cores, 64 GB memory 1 Knights Corner adapter per node Mellanox FDR full fat-tree interconnect

Erebus Antarctic Mesoscale Prediction System (AMPS) IBM idataplex Compute Cluster 84 IBM dx360 M4 Nodes; 16 cores, 32 GB Intel Sandy Bridge EP; 2.6 GHz clock 1,344 cores total 28 TFLOPs peak Mellanox FDR InfiniBand full fat-tree 0.54 Bluefire equivalents Login Nodes 2 IBM x3650 M4 Nodes 16 cores & 128 GB memory per node Dedicated GPFS filesystem 57.6 TB usable disk storage 9.6 GB/sec aggregate I/O bandwidth Erebus, on Ross Island, is Antarctica s most famous volcanic peak and is one of the largest volcanoes in the world within the top 20 in total size and reaching a height of 12,450 feet. 0 90 W 90 E 180

Yellowstone Software Compilers, Libraries, Debugger & Performance Tools Intel Cluster Studio (Fortran, C++, performance & MPI libraries, trace collector & analyzer) 50 concurrent users Intel VTune Amplifier XE performance optimizer 2 concurrent users PGI CDK (Fortran, C, C++, pgdbg debugger, pgprof) 50 conc. users PGI CDK GPU Version (Fortran, C, C++, pgdbg debugger, pgprof) for DAV systems only, 2 concurrent users PathScale EckoPath (Fortran C, C++, PathDB debugger) 20 concurrent users Rogue Wave TotalView debugger 8,192 floating tokens IBM Parallel Environment (POE), including IBM HPC Toolkit System Software LSF-HPC Batch Subsystem / Resource Manager IBM has purchased Platform Computing, Inc., developers of LSF-HPC Red Hat Enterprise Linux (RHEL) Version 6 IBM General Parallel Filesystem (GPFS) Mellanox Universal Fabric Manager IBM xcat cluster administration toolkit

Yellowstone Schedule Delivery, Installation, Acceptance & Production Production Science ASD & early users Acceptance Testing CFDS & DAV Delivery & Installation Infrastructure & Test Systems Delivery & Installation Storage Delivery & Installation Integration & Checkout HPC Delivery & Installation Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct

Janus: Available now Janus Dell Linux cluster 16,416 cores total 184 TFLOPs peak 1,368 nodes 12 cores, 24 GB memory per node Intel Westmere processors 2.8 GHz clock 32.8 TB total memory QDR InfiniBand interconnect Red Hat Linux, Intel compilers (PGI coming) Deployed by CU in collaboration with NCAR ~10% of the system allocated by NCAR Available for Small allocations to university, NCAR users CESM, WRF already ported and running Key elements of NCAR software stack already installed www2.cisl.ucar.edu/docs/janus-cluster 13

Yellowstone allocation opportunities 14

Yellowstone funding sources Yellowstone will be capable of 653 million core-hours per year, compared to 34 million for Bluefire, and each Yellowstone core-hour is equivalent to 1.53 Bluefire core-hours.

Yellowstone allocations opportunities The segments for CSL, University and NCAR users each represent about 170 million core-hours per year on Yellowstone (compared to less than 10 million per year on Bluefire) plus a similar portion of DAV and GLADE resources.

Early-use opportunity: Accelerated Scientific Discovery Deadline: January 13, 2012 Targeting a small number of rapid-turnaround, large-scale projects Minimum HPC request of 5 million core hours Roughly May-July, with access to DAV systems beyond that point through final report deadline, February 2013 Approximately 140 million core-hours, in two parts University-led projects with NSF awards in the geosciences will be allocated 70 million core-hours NCAR-led projects will make up the other half, selected from NCAR Strategic Capability requests that designate themselves ASD-ready. Particularly looking for projects that contribute to NWSC Community Science Objectives High bar for production readiness, including availability of staff time www2.cisl.ucar.edu/docs/allocations/asd 17

Climate Simulation Laboratory Deadline: mid-february 2012 Targets large-scale, long-running simulations of the Earth's climate Dedicated facility supported by the U.S. Global Change Research Program Must be climate-related work, but support may be from any agency Minimum request and award size Typically 18-month allocation period Approx. 250 million core-hours to be allocated Estimated minimum request size: ~10 million core-hours Preference given to large, collective group efforts, preferably interdisciplinary teams 18

University allocations Next deadline: March 26, 2012 Large allocations will continue to be reviewed and awarded twice per year Deadlines in March and September Approx. 85 million core-hours to be allocated at each opportunity Small allocations will also be available once system enters full production Small threshold still to be determined Small allocations for researchers with NSF award appropriate for benchmarking, preparation for large request Small, one-time allocations for grad students, post-docs, new faculty without NSF award Classroom allocations for instructional use www2.cisl.ucar.edu/docs/allocations/university 19

Wyoming-NCAR Alliance Deadline: March 2012 (TBD) 13% of Yellowstone resources 75 million core-hours per year U Wyoming managed process Activities must have substantial U Wyoming involvement Allocated projects must have Wyoming lead Extended list of eligible fields of science Eligible funding sources not limited to NSF Actively seeking to increase collaborations with NCAR and with other EPSCoR states. Otherwise, process modeled on University allocations, with panel review of large requests. www.uwyo.edu/nwsc 20

NCAR Strategic Capability projects First submission deadline: January 13, 2012 17% of Yellowstone resources 100 million core-hours per year 80x the NCAR Capability Computing annual levels NCAR-led activities linked to NCAR scientific/strategic priorities and/or NWSC science justification Target large-scale projects within finite time period (one year to a few years) Minimum of 5 million core-hours (exceptions possible) Proposed and reviewed each year Submission format and review criteria similar to that used for large University requests NCAR ASD projects will be selected from requests here that designate themselves ASD-ready www2.cisl.ucar.edu/docs/allocations/ncar

Allocation changes in store Not just HPC, but DAV, HPSS, GLADE allocations Non-HPC resources 1/3 procurement cost Ensure that use of scarce and costly resources are directed to the most meritorious projects Balance between the time to prepare and review requests and the resources provided Minimize user hurdles and reviewer burden Build on familiar process for requesting HPC allocations Want to identify projects contributing to the NWSC Community Scientific Objectives www2.cisl.ucar.edu/resources/yellowstone/science All new, redesigned accounting system (SAM) Separate, easier to understand allocations Switchable 30/90 option, per project, as an operational control ( 30/90 familiar to NCAR labs and CSL awardees) 22

General submission format Please see specific opportunities for detailed guidelines! Five-page request A. Project information (title, lead, etc.) B. Project overview and strategic linkages C. Science objectives D. Computational experiments and resource requirements (HPC, DAV, and storage) Supporting information E. Multi-year plan (if applicable) F. Data management plan G. Accomplishment report H. References and additional figures 23

Tips and advice Remember your audience: computational geoscientists from national labs, universities and NCAR Don t assume they are experts in your specialty Be sure to articulate relevance and linkages Between funding award, computing project, eligibility criteria, and NWSC science objectives (as appropriate) Don t submit a science proposal Describe the science in detail sufficient to justify the computational experiments proposed Most of the request should focus on computational experiments and resource needs Effective methodology Appropriateness of experiments Efficiency of resource use 24

Justifying resource needs HPC similar to current practice Cost of runs necessary to carry out experiment, supported by benchmark runs or published data DAV will be allocated, similar to HPC practice A small allocation will be granted upon request Allocation review to focus on larger needs associated with batch use Memory and GPU charging to be considered HPSS focus on storage needs above a threshold 20-TB default threshold initially Perhaps lower default for small allocations CISL to evaluate threshold regularly to balance requester/reviewer burden with demand on resources Simplified request/charging formula GLADE project (long-term) spaces will be reviewed and allocated scratch, user spaces not allocated 25

GLADE resource requests Only for project space No need to detail use of scratch, user spaces Describe why project space is essential That is, why scratch or user space insufficient Show that you are aware of the differences Shared data, frequently used, not available on disk from RDA, ESG (collections space) Relate the storage use to your workflow and computational plan Projects with data-intensive workflows should show they are using resources efficiently 26

HPSS resource requests Goal: Demonstrate that HPSS use is efficient and appropriate Not fire and forget into a data coffin Not using as a temporary file system Explain new data to be generated Relate to computational experiments proposed Describe scientific value/need for data stored Justify existing stored data Reasons for keeping, timeline for deletion Data management plan: Supplementary information Additional details on the plans and intents for sharing, managing, analyzing, holding the data 27

http://www2.cisl.ucar.edu/resources/yellowstone http://www2.cisl.ucar.edu/docs/allocations cislhelp@ucar.edu or dhart@ucar.edu QUESTIONS?