NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017 Overview The Globally Accessible Data Environment (GLADE) provides centralized file storage for HPC computational, data-analysis, visualization and science gateway resources. The environment provides >220GB/s high-performance bandwidth accessible over FDR/EDR IB or 10/40Gb Ethernet networks. The GLADE environment also provides data transfer services to facilitate data movement both within NCAR and to external resources. These data access nodes support common data transfer protocols, Globus transfer service endpoints, Globus data sharing services and high-bandwidth access to the NCAR HPSS system. File Systems NCAR HPC file systems are currently all part of the Globally Accessible Data Environment (GLADE) and can be accesses by all current HPC systems simplifying data sharing between platforms. File systems are configured for different purposes with a total of five spaces available served by three file systems. All file systems are implemented using IBM s Spectrum Scale parallel file system, formerly GPFS. Home (/glade/u/home): Permanent, relatively small storage for data like source code, shell scripts, etc. This file system is not tuned for high performance from parallel jobs, therefore it is not recommended for usage during batch job runs. Project (/glade/p): Large, allocated, permanent, high-performance file system. Project directories are intended for sharing data within a group of researchers and are allocated as part of the annual allocations process. It is recommended that this file system be used for computational runs where the data will be accessed post run in the near future. Work (/glade/p/work): Medium, permanent, high-performance file space. Project directories intended for individual work where data needs to remain resident for an extended period of time. Scratch (/glade/scratch): Large, high-performance file system. Place large data files in this file system for capacity and capability computing. Data is purged as described below, so you must save important files elsewhere, like HPSS, prior to the purge period. Share (/glade/p/datashare): Medium, permanent, high-performance file space. Accessible through Globus Online services and intended to facilitate data transfers to non-ncar users. Available on request. GLADE Overview 15Feb17 1
Summary of File Space Policies File Space Type Peak Perf Quota Backups Purge /glade/u/home/user SS Fileset 8 GB/s 25 GB Yes No /glade/p/project SS Fileset 90 GB/s allocated No No /glade/p/work/user SS Fileset 90 GB/s 512 GB No No /glade/scratch/user SS 220 GB/s 10 TB No > 45 days /glade/p/datashare SS 90 GB/s allocated No No File Space Intended Use File System Intended Use File Optimization /glade/u/home Hold source code, executables, configuration files, etc. NOT meant to hold the output from your application runs; the scratch or project file systems should be used for computational output. Optimized for small to medium sized files. /glade/p /glade/p/work /glade/scratch Sharing data within a team or across computational platforms. Store application output files or common scripts, source code and executables. Intended for actively used data that needs to remain resident for an extended period of time. Store application output files intended for individual use. Intended for actively used data that needs to remain resident for an extended period of time. The scratch file system is intended for temporary uses such as storage of checkpoints or application result output. If files need to be retained longer than the purge period, the files should be copied to project space or to HPPS. Optimized for highbandwidth, largeblock-size access to large files. Optimized for highbandwidth, largeblock-size access to large files. Optimized for highbandwidth, largeblock-size access to large files. GLADE Overview 15Feb17 2
/data/share Sharing data with non-ncar users. Intended for transient data being delivered offsite. Only available through the Globus Online service. Optimized for network-based transfers. Summary of File Space Capacities File Space Capacity Block Size /glade/u/home 90 TB 1M /glade/p 16.5 PB 4M /glade/p/work 768 TB 4M /glade/scratch 15 PB 8M /glade/p/datashare 500 TB 4M Current File System Usage The following table shows the current file system usage sorted by file system and number of files. The table represents usage across all the available GLADE file systems. Note that in the summary there is a difference in the amount of space actually being used versus the amount of space allocated. This is due to the minimum allocated block size. The data is current as of 14 Feb 2017. Total Space Used Space Allocated Total Size 14.0516 PB 14.1511 PB Number of Files 894,960,550 14.0502 PB 14.1476 PB Number of Directories 47,643,338 1.53 TB 2.95 TB Number of Links 66,704,055 4.91 GB 661.97 GB GLADE Overview 15Feb17 3
File Distribution Size Bucket # of Files % of Files Total Size % of Capacity 0 Bytes 33,312,365 3.72 % 0 bytes 0.00% < 512 Bytes 92,758,142 10.36% 15.39 GB 0.00% < 4 KB 176,062,019 19.67% 354.50 GB 0.00% <16 KB 155,940,790 17.42% 1.26 TB 0.01% <128 KB 195,007,583 21.79% 9.11 TB 0.06% <512 KB 75,125,649 8.39% 18.31 TB 0.13% <1 MB 19,218,331 2.15% 13.04 TB 0.09% <4 MB 53,616,514 5.99% 111.86 TB 0.78% <100 MB 76,426,507 8.54% 1.7527 PB 12.47% <1 GB 14,870,396 1.66% 4.3089 PB 30.67% <10 GB 2,526,789 0.28% 5.3032 PB 37.74% <100 GB 93,009 0.01% 2.0456 PB 14.56% <1 TB 2,420 0.00% 443.46 TB 3.08% 1 TB+ 36 0.00% 57.62 TB 0.40% GLADE Overview 15Feb17 4
Systems Served System Description Purpose Connect cheyenne SGI ICE-XA Cluster main computational cluster EDR IB yellowstone IBM idataplex Cluster main computational cluster FDR IB geyser caldera pronghorn data-access RDA science gateway ESG science gateway CDP science gateway IBM xseries Cluster, nvidia GPU IBM idataplex Cluster, nvidia GPU IBM idataplex Cluster, nvidia GPU Linux Cluster Linux Cluster, web services Linux Cluster, web services Linux Cluster, web services data analysis and visualization GPGPU computational cluster GPGPU computational cluster Globus data transfer services, data sharing services Research Data Archive Service Earth Systems Grid Service Community Data Portal Service FDR IB FDR IB FDR IB 40GbE 10GbE 10GbE 10GbE Data Access Nodes Service Globus / GridFTP Globus+ Data Share HPSS Bandwidth 20 GB/s 20 GB/s 30 GB/s GLADE Overview 15Feb17 5
Current GLADE Architecture At this time, GLADE resources are split between the FDR IB network of Yellowstone and the EDR network of Cheyenne. The following diagram shows the various components and network connections of GLADE. Figure 1. Current GLADE Architecture GLADE Overview 15Feb17 6
Future GLADE Architecture In early 2018, we will be decommissioning the Yellowstone HPC resource. At that time, all GLADE resources will be relocated to the EDR network of Cheyenne. The following diagram shows the various components and network connections of GLADE in early 2018. Figure 2. Future GLADE Architecture GLADE Overview 15Feb17 7