CISL Update. 29 April Operations and Services Division

Similar documents
NCAR s Data-Centric Supercomputing Environment Yellowstone. November 28, 2011 David L. Hart, CISL

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 29, 2011 David L. Hart, CISL

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

CISL Update Operations and Yellowstone. CISL HPC Advisory Panel Meeting 18 October 2012

The NCAR Yellowstone Data Centric Computing Environment. Rory Kelly ScicomP Workshop May 2013

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

ALICE Grid Activities in US

Preparing GPU-Accelerated Applications for the Summit Supercomputer

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Using Quality of Service for Scheduling on Cray XT Systems

Data Management Components for a Research Data Archive

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Illinois Proposal Considerations Greg Bauer

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

NCAR Workload Analysis on Yellowstone. March 2015 V5.0

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

University at Buffalo Center for Computational Research

Oak Ridge National Laboratory Computing and Computational Sciences

HPC Capabilities at Research Intensive Universities

Outline. March 5, 2012 CIRMMT - McGill University 2

Storage Supporting DOE Science

Regression Testing on Petaflop Computational Resources. CUG 2010, Edinburgh Mike McCarty Software Developer May 27, 2010

User Training Cray XC40 IITM, Pune

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Comet Virtualization Code & Design Sprint

Shared Services Canada Environment and Climate Change Canada HPC Renewal Project

TeraGrid TeraGrid and the Path to Petascale

Fast computers, big/fast storage, fast networks Marla Meehl

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

CISL Operations and Yellowstone Update

The Architecture and the Application Performance of the Earth Simulator

Compute Node Linux: Overview, Progress to Date & Roadmap

SuperMike-II Launch Workshop. System Overview and Allocations

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

Current Progress of Grid Project in KMA

Transitioning NCAR MSS to HPSS

The Spider Center-Wide File System

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

Yellowstone allocations and writing successful requests. November 27, 2012 David L. Hart, CISL

The Red Storm System: Architecture, System Update and Performance Analysis

Early Operational Experience with the Cray X1 at the Oak Ridge National Laboratory Center for Computational Sciences

Bill Boroski LQCD-ext II Contractor Project Manager

Lustre at Scale The LLNL Way

Regional & National HPC resources available to UCSB

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Dr. John Dennis

Supercomputing at the United States National Weather Service (NWS)

Overview of HPC at LONI

Indiana University s Lustre WAN: The TeraGrid and Beyond

IBM Spectrum Scale IO performance

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Execution Models for the Exascale Era

CISL Update Operations and Services

The Hopper System: How the Largest XE6 in the World Went From Requirements to Reality

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Update on Cray Activities in the Earth Sciences

Longhorn Project TACC s XD Visualization Resource

Parallel File Systems Compared

NUIT Tech Talk Topics in Research Computing: XSEDE and Northwestern University Campus Champions

Philip C. Roth. Computer Science and Mathematics Division Oak Ridge National Laboratory

GPFS for Life Sciences at NERSC

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Practical Scientific Computing

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Present and Future Leadership Computers at OLCF

Data Movement & Storage Using the Data Capacitor Filesystem

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

The Cray Rainier System: Integrated Scalar/Vector Computing

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Data storage services at KEK/CRC -- status and plan

BRC HPC Services/Savio

ACCRE High Performance Compute Cluster

DELIVERABLE D5.5 Report on ICARUS visualization cluster installation. John BIDDISCOMBE (CSCS) Jerome SOUMAGNE (CSCS)

Organizational Update: December 2015

Performance Analysis and Prediction for distributed homogeneous Clusters

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

CP2K Performance Benchmark and Profiling. April 2011

Cluster Network Products

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners

Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

HYCOM Performance Benchmark and Profiling

I/O Monitoring at JSC, SIONlib & Resiliency

GAIA CU6 Bruxelles Meeting (12-13 october 2006)

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

Cloud Computing For Researchers

Steven Carter. Network Lead, NCCS Oak Ridge National Laboratory OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY 1

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

Transcription:

CISL Update Operations and Services CISL HPC Advisory Panel Meeting Anke Kamrath anke@ucar.edu Operations and Services Division Computational and Information Systems Laboratory 1 CHAP Meeting 14 May 2009

Overview Staff Comings and Goings in OSD Updates: NWSC- 1 RFP Update Data Services Vision for NWSC Preparing for the Changing Data Workflow Archival Migration: MSS to HPSS Cray XT5 Rack Lynx GLADE Deployment (Presentation from Pam) Storage Accounting (Discussion with CHAP) Export Controls Users and Allocations University Publications in 2009 NWSC Allocation Plans WNA (Wyoming NCAR Allocations) Accessing Science Merit of WNA proposals Discussion with CHAP New TIGGE Award 2 CHAP Meeting

Changes Staff Comings and Goings Retirements Ed Arnold, DSG (April) Juli Rew, CSG (end of May) Julie Chapin, WEG (June) Leif Magden, WEG (April) ESS - Jasen Boyington, Co-location Manager HSS/CSG - Rory Kelly moved from TDD to CSG (replacing Michael Page) HSS/DASG - Dan Lagreca on Vapor project NETS -Blake Caldwell, ll Network Engineer 1 Openings Elevating User Service role to Section Level Job Opening Posted. Interviews underway. DBA II Open DSG Group Leader (Lynda McGinley returning to technical role) 3 CHAP Meeting

NWSC-1 RFP Update SAP (Science Advisory Panel) Met mid-april 2010 Refining Science Requirements Via questionnaire (see handout) Meeting in May CHAP: Bert is representing CHAP; will share questionnaire Analyze results and provide to TET Identify key Benchmark applications TET (Technology Evaluation Team) and BET (Business Evaluation Team) kicking off their efforts soon. 4 CHAP Meeting

A Changing Scientific Data Workflow Process Centric ` to Information Centric ` 5 CHAP Meeting

NWSC Conceptual Data Architecture Remote Vis Partner Sites TeraGrid Sites Science Gateways RDA, ESG Data Transfer Services 10Gb/40Gb/100Gb Ethernet HPSS 150 PB High Bandwidth I/O Network InfiniBand, 10Gb/40Gb Ethernet Data Collections Project Spaces Scratch Archive Interface Data Analysis Visualization Nodes Computational Nodes Storage Cluster 15 PB 300GB/s burst 6 CHAP Meeting

Challenges and Trade-offs for Procurement How to split up $$. $30M Compute/Disk; $6.4M Archive One approach is to set I/O BW and Filesystem Size. See what FLOPS come out. STORAGE PFLOPS (max PFLOPS PFLOPS (min COMPUTE Config I/O Sus I/O Max Size ($M) $/PF) (avg $/PF) $/PF) ($M) 1 225 GB/s 300 GB/s 15 TB $10.12 0.43 0.83 1.39 $19.88 2 225 GB/s 300 GB/s 6 TB $9.03 0.45 0.88 1.46 $20.97 3 112 GB/s 150 GB/s 15 TB $7.51 0.49 0.94 1.57 $22.49 4 112 GB/s 150 GB/s 6 TB $6.67 0.50 0.98 1.63 $23.33 7 CHAP Meeting

Preparing for the Changing Data Workflow MSS-HPSS Migration GLADE (Globally Accessible Data Environment) Pam to present Storage Allocations Discussion during Pam s presentation System and Filesystem Evaluations GPFS Lustre WAN filesystems CRAY XT5 (Lynx) 8 CHAP Meeting

Migrating from MSS to HPSS (Update) Expansion of HPSS library from 1PB to 5 PB (capacity) in Jan 2010. Acquisition of 150 TB of disk for HPSS data movers (to be purchased Q2) Currently have ~50 users. Monthly meetings to consult with users on the NCAR MSS to HPSS transition. Migration to be complete Jan 2011 Projects: Legacy tape project (accessing MSS data from HPSS) - on track, additional testing, code development done Gatekeeper/load balancing - currently in evaluation phase (this is for distributing load across host clients like bluefire vs. data analysis machines to ensure that each gets a "fair share" of the HPSS traffic) 9 CHAP Meeting

Cray XT5 Comes to NCAR Lynx (a small Jaguar) Arrived Monday, April 26. Test & evaluate Cray products, administration, Lustre, GLADE (GPFS/DVS), alternate batch subsystems & resource management Local resource comparable to ORNL Jaguar, NICS Kraken & NERSC Franklin participating in IPCC AR5 Single cabinet Cray XT5m (8.03 TFLOPs) 76 compute nodes (912 total processors) 2 hex-core 2.2 GHz AMD Opteron (Istanbul) chips (12 cpus/node) 16 GBytes memory per node (1.33 GB/core) 2 Login nodes (4 total processors) 1 dual-core 2.6 GHz AMD Opteron 8 GBytes memory (4 GB/core) 8 I/O nodes 4 reserved for Cray system functions & LSI/Lustre filesystems 2 for GPFS & Cray DVS testing (each optical 10-GigE) 2 for Lustre & Cray DVS testing (each optical 10-GigE) Single cabinet LSI disk subsystem (32 TBytes) Cray Linux Environment, Cray, PGI & EKOPATH compilers & libraries, MPI, SHMEM & OpenMP, optimized math & scientific libraries, Lustre, MOAB/Torque software 10 CHAP Meeting

New requirements No services to Export Controls Embargoed Nations (Cuba, N. Korea, Iran, Sudan, Syria) Prohibited end-user (six lists) Services include compute, storage, phone, email,.. Many universities i i are struggling with requirements (no consensus within TeraGrid) CISL Developing Phased Approach 1. HPC / Storage Users First new users Then existing users 2. Data Collection Users 3. Software Users 11 CHAP Meeting

Users and Allocations FY09 University Publications/Presentations Strong University Usage NWSC Allocations New Category - Wyoming-NCAR-Allocations (WNA) Assessing Science Merit of Wyoming Allocations 12 CHAP Meeting

Number of Publi ications 40 35 30 25 20 15 10 5 FY2009 University Publications and Presentations which used CISL Resources Presentations Non-Peer- Reviewed Peer- Reviewed 241 Papers and Presentations Reported for 43 Universities 104 Peer-Reviewed 18 Other Papers 119 Presentations and Proceedings 0 13 CHAP Meeting

14 CHAP Meeting

NWSC Allocations Wyoming-NCAR Allocations (WNA) 20% Reserve 6.0% University 39.2% Joint Univ/ NCAR 1.5% Climate Simulation Lab Other 66.0% 28% WNA 20% NCAR Labs 39.2% 15 CHAP Meeting

Assessing Scientific Merit of WNA Allocations Need CHAP s help as we built out WNA Process. criteria to be considered by the WNA Allocation Committee shall include the following: Overall scientific merit (If a proposal has been peer reviewed and received an NSF or other Federal agency grant, it will be deemed to have scientific merit; however, other mechanisms may be used by the WNA Allocation Committee to determine scientific merit. Proposals not funded d by NSF or other Federal agency grant, will initially be referred to the CISL High performance computing Advisory Panel (CHAP) for review and approval. Are for research in Earth System science subject matter areas that are of substantial interest to the WNA (in identifying such areas, the WNA shall consider areas that are of significant interest to the State of Wyoming, as determined by UW); Have substantial involvement of both UW and NCAR researchers; Include UW researchers as the principal p or co-principal p investigator; Strengthen UW s research capacity directly or through collaboration with other entities; Directly or through collaboration, strengthens university computational science capacity in EPSCOR states; and Are in new or emerging Earth System science research areas. 16 CHAP Meeting

RDA : TIGGE Archive Access Improvements and Validation Data Portal (Ensemble weather forecast data from 10 NWP centers worldwide) Two-year funding from NSF to support a SE II CISL will contribute GLADE storage and computing Service improvements Faster access to the 300+ TB archive For CISL HPC and web services using GLADE and HPSS Faster turn around on user requests For subsetting and re-gridding using a Linux cluster or Lynx Customized access to RDA weather observations For model forecast validation. 17 CHAP Meeting

Questions and Discussion 18 CHAP Meeting