NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

Similar documents
Cheyenne NCAR s Next-Generation Data-Centric Supercomputing Environment

NCAR Workload Analysis on Yellowstone. March 2015 V5.0

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 28, 2011 David L. Hart, CISL

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

The NCAR Yellowstone Data Centric Computing Environment. Rory Kelly ScicomP Workshop May 2013

NCAR Workload Analysis on Yellowstone. September 2014 V4.1

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduction to Cheyenne. 12 January, 2017 Consulting Services Group Brian Vanderwende

NCAR s Data-Centric Supercomputing Environment Yellowstone. November 29, 2011 David L. Hart, CISL

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Storage Supporting DOE Science

Lustre usages and experiences

Comet Virtualization Code & Design Sprint

Data Management Components for a Research Data Archive

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Yellowstone allocations and writing successful requests. November 27, 2012 David L. Hart, CISL

Illinois Proposal Considerations Greg Bauer

An ESS implementation in a Tier 1 HPC Centre

DATARMOR: Comment s'y préparer? Tina Odaka

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

GPFS for Life Sciences at NERSC

Outline. March 5, 2012 CIRMMT - McGill University 2

Parallel File Systems. John White Lawrence Berkeley National Lab

CISL Update. 29 April Operations and Services Division

Graham vs legacy systems

Interconnect Your Future

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Data storage services at KEK/CRC -- status and plan

InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

XSEDE New User Training. Ritu Arora November 14, 2014

Filesystems on SSCK's HP XC6000

Introduction to High-Performance Computing (HPC)

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Tape Data Storage in Practice Minnesota Supercomputing Institute

Genius Quick Start Guide

SuperMike-II Launch Workshop. System Overview and Allocations

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Habanero Operating Committee. January

Data Transfer Study for HPSS Archiving

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

HOKUSAI System. Figure 0-1 System diagram

Organizational Update: December 2015

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Scientific Computing with UNICORE

Introduction to the Cluster

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Using SDSC Systems (part 2)

University at Buffalo Center for Computational Research

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Advanced Photon Source Data Management. S. Veseli, N. Schwarz, C. Schmitz (SDM/XSD) R. Sersted, D. Wallis (IT/AES)

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Overview of the Texas Advanced Computing Center. Bill Barth TACC September 12, 2011

Data Movement and Storage. 04/07/09 1

NERSC. National Energy Research Scientific Computing Center

Transitioning NCAR MSS to HPSS

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Guillimin HPC Users Meeting November 16, 2017

CISL Update Operations and Services

I/O at the Center for Information Services and High Performance Computing

OBTAINING AN ACCOUNT:

INTRODUCTION TO THE CLUSTER

Data Transfers in the Grid: Workload Analysis of Globus GridFTP

Technology Insight Series

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler

Getting Started with XSEDE. Dan Stanzione

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

SAM at CCIN2P3 configuration issues

How to Use a Supercomputer - A Boot Camp

AFM Use Cases Spectrum Scale User Meeting

Using file systems at HC3

IBM CORAL HPC System Solution

Flux: The State of the Cluster

Introduction to High-Performance Computing (HPC)

CYFRONET SITE REPORT IMPROVING SLURM USABILITY AND MONITORING. M. Pawlik, J. Budzowski, L. Flis, P. Lasoń, M. Magryś

Leonhard: a new cluster for Big Data at ETH

1 of 8 10/10/2018, 12:52 PM RM-01, 10/10/2018. * Required. 1. Agency Name: * 2. Fiscal year reported: * 3. Date: *

Analytics in the cloud

Extraordinary HPC file system solutions at KIT

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Experiences using a multi-tiered GPFS file system at Mount Sinai. Bhupender Thakur Patricia Kovatch Francesca Tartagliogne Dansha Jiang

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

NetApp High-Performance Storage Solution for Lustre

How to access Geyser and Caldera from Cheyenne. 19 December 2017 Consulting Services Group Brian Vanderwende

An introduction to GPFS Version 3.3

Blue Waters System Overview. Greg Bauer

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

Choosing Resources Wisely. What is Research Computing?

Transcription:

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis, visualization and science gateway resources. The environment provides >220GB/s high-performance bandwidth accessible over FDR/EDR IB or 10/40Gb Ethernet networks. The GLADE environment also provides data transfer services to facilitate data movement both within NCAR and to external resources. These data access nodes support common data transfer protocols, Globus transfer service endpoints, Globus data sharing services and high-bandwidth access to the NCAR HPSS system. File Systems NCAR HPC file systems are currently all part of the Globally Accessible Data Environment (GLADE) and can be accesses by all current HPC systems simplifying data sharing between platforms. File systems are configured for different purposes with a total of five spaces available served by three file systems. All file systems are implemented using IBM s Spectrum Scale parallel file system, formerly GPFS. Home (/glade/u/home): Permanent, relatively small storage for data like source code, shell scripts, etc. This file system is not tuned for high performance from parallel jobs, therefore it is not recommended for usage during batch job runs. Project (/glade/p): Large, allocated, permanent, high-performance file system. Project directories are intended for sharing data within a group of researchers and are allocated as part of the annual allocations process. It is recommended that this file system be used for computational runs where the data will be accessed post run in the near future. Work (/glade/p/work): Medium, permanent, high-performance file space. Project directories intended for individual work where data needs to remain resident for an extended period of time. Scratch (/glade/scratch): Large, high-performance file system. Place large data files in this file system for capacity and capability computing. Data is purged as described below, so you must save important files elsewhere, like HPSS, prior to the purge period. Share (/glade/p/datashare): Medium, permanent, high-performance file space. Accessible through Globus Online services and intended to facilitate data transfers to non-ncar users. Available on request. GLADE Overview 15Feb17 1

Summary of File Space Policies File Space Type Peak Perf Quota Backups Purge /glade/u/home/user SS Fileset 8 GB/s 25 GB Yes No /glade/p/project SS Fileset 90 GB/s allocated No No /glade/p/work/user SS Fileset 90 GB/s 512 GB No No /glade/scratch/user SS 220 GB/s 10 TB No > 45 days /glade/p/datashare SS 90 GB/s allocated No No File Space Intended Use File System Intended Use File Optimization /glade/u/home Hold source code, executables, configuration files, etc. NOT meant to hold the output from your application runs; the scratch or project file systems should be used for computational output. Optimized for small to medium sized files. /glade/p /glade/p/work /glade/scratch Sharing data within a team or across computational platforms. Store application output files or common scripts, source code and executables. Intended for actively used data that needs to remain resident for an extended period of time. Store application output files intended for individual use. Intended for actively used data that needs to remain resident for an extended period of time. The scratch file system is intended for temporary uses such as storage of checkpoints or application result output. If files need to be retained longer than the purge period, the files should be copied to project space or to HPPS. Optimized for highbandwidth, largeblock-size access to large files. Optimized for highbandwidth, largeblock-size access to large files. Optimized for highbandwidth, largeblock-size access to large files. GLADE Overview 15Feb17 2

/data/share Sharing data with non-ncar users. Intended for transient data being delivered offsite. Only available through the Globus Online service. Optimized for network-based transfers. Summary of File Space Capacities File Space Capacity Block Size /glade/u/home 90 TB 1M /glade/p 16.5 PB 4M /glade/p/work 768 TB 4M /glade/scratch 15 PB 8M /glade/p/datashare 500 TB 4M Current File System Usage The following table shows the current file system usage sorted by file system and number of files. The table represents usage across all the available GLADE file systems. Note that in the summary there is a difference in the amount of space actually being used versus the amount of space allocated. This is due to the minimum allocated block size. The data is current as of 14 Feb 2017. Total Space Used Space Allocated Total Size 14.0516 PB 14.1511 PB Number of Files 894,960,550 14.0502 PB 14.1476 PB Number of Directories 47,643,338 1.53 TB 2.95 TB Number of Links 66,704,055 4.91 GB 661.97 GB GLADE Overview 15Feb17 3

File Distribution Size Bucket # of Files % of Files Total Size % of Capacity 0 Bytes 33,312,365 3.72 % 0 bytes 0.00% < 512 Bytes 92,758,142 10.36% 15.39 GB 0.00% < 4 KB 176,062,019 19.67% 354.50 GB 0.00% <16 KB 155,940,790 17.42% 1.26 TB 0.01% <128 KB 195,007,583 21.79% 9.11 TB 0.06% <512 KB 75,125,649 8.39% 18.31 TB 0.13% <1 MB 19,218,331 2.15% 13.04 TB 0.09% <4 MB 53,616,514 5.99% 111.86 TB 0.78% <100 MB 76,426,507 8.54% 1.7527 PB 12.47% <1 GB 14,870,396 1.66% 4.3089 PB 30.67% <10 GB 2,526,789 0.28% 5.3032 PB 37.74% <100 GB 93,009 0.01% 2.0456 PB 14.56% <1 TB 2,420 0.00% 443.46 TB 3.08% 1 TB+ 36 0.00% 57.62 TB 0.40% GLADE Overview 15Feb17 4

Systems Served System Description Purpose Connect cheyenne SGI ICE-XA Cluster main computational cluster EDR IB yellowstone IBM idataplex Cluster main computational cluster FDR IB geyser caldera pronghorn data-access RDA science gateway ESG science gateway CDP science gateway IBM xseries Cluster, nvidia GPU IBM idataplex Cluster, nvidia GPU IBM idataplex Cluster, nvidia GPU Linux Cluster Linux Cluster, web services Linux Cluster, web services Linux Cluster, web services data analysis and visualization GPGPU computational cluster GPGPU computational cluster Globus data transfer services, data sharing services Research Data Archive Service Earth Systems Grid Service Community Data Portal Service FDR IB FDR IB FDR IB 40GbE 10GbE 10GbE 10GbE Data Access Nodes Service Globus / GridFTP Globus+ Data Share HPSS Bandwidth 20 GB/s 20 GB/s 30 GB/s GLADE Overview 15Feb17 5

Current GLADE Architecture At this time, GLADE resources are split between the FDR IB network of Yellowstone and the EDR network of Cheyenne. The following diagram shows the various components and network connections of GLADE. Figure 1. Current GLADE Architecture GLADE Overview 15Feb17 6

Future GLADE Architecture In early 2018, we will be decommissioning the Yellowstone HPC resource. At that time, all GLADE resources will be relocated to the EDR network of Cheyenne. The following diagram shows the various components and network connections of GLADE in early 2018. Figure 2. Future GLADE Architecture GLADE Overview 15Feb17 7