NCAR Workload Analysis on Yellowstone. September 2014 V4.1

Size: px

Start display at page:

Download "NCAR Workload Analysis on Yellowstone. September 2014 V4.1"

Deborah Heath
5 years ago
Views:

1 NCAR Workload Analysis on Yellowstone September 2014 V4.1

2 Purpose and Scope of the Analysis Understanding the NCAR applica5on workload is a cri5cal part of making efficient use of Yellowstone and in scoping the future system procurements. Analysis of applica5on performance on Yellowstone is the first step in understanding the transi5on needed to move to new architectures. Primary sources of informa5on for the analysis included: Science area Applica,on code 3 rd party applica,on usage Algorithm Job size Memory usage Threading usage Library usage I/O pa@erns 2

3 Yellowstone Environment Yellowstone (High- performance compu;ng) IBM idataplex Cluster with Intel Sandy Bridge processors 1.5 PetaFLOPs 4,536 nodes 72,576 Xeon E cores 145 TB total memory Mellanox FDR InfiniBand full fat- tree interconnect GLADE (Centralized file systems and data storage) 16.4 PB GPFS file systems, 90 GB/s aggregate I/O bandwidth Geyser & Caldera (Data analysis and visualiza;on) Large- memory system with Intel Westmere EX processors 16 nodes, 640 Westmere- EX cores, 1 TB/node, 16 NVIDIA K5000 GPU s GPU computa5on/vis system with Intel Sandy Bridge processors 16 nodes, 256 Xeon E cores, 64 GB/node, 32 NVIDIA K20X GPUs Pronghorn (Intel Phi testbed system) 16 nodes, 256 Xeon E cores; 64 GB/node 32 Intel Phi 5110P adapters (Knight s Corner)

Yellowstone Physical Infrastructure Resource HPC GLADE DAV AMPS # Racks 63 - idataplex Racks (72 nodes per rack) 10-19 Racks (9 Mellanox FDR core

nodes, management, IB & Ethernet switches) 1 - idataplex Rack (GPU- Comp & Knights Corner) 2-19 Racks (Large Memory, management, IB switch) 1 -

4 Yellowstone Physical Infrastructure Resource HPC GLADE DAV AMPS # Racks 63 - idataplex Racks (72 nodes per rack) Racks (9 Mellanox FDR core switches, 1 Ethernet switch) 1-19 Rack (login, service, management nodes) 19 - NSD Server, Controller and Storage Racks 1-19 Rack (I/O aggregator nodes, management, IB & Ethernet switches) 1 - idataplex Rack (GPU- Comp & Knights Corner) 2-19 Racks (Large Memory, management, IB switch) 1 - idataplex Rack 1-19 Rack (login, IB, NSD, disk & management nodes) Total Power Required HPC GLADE DAV AMPS ~1.7 MW ~1.4 MW MW MW MW

5 Yellowstone Environment Yellowstone GLADE HPC resource, 1.5 PFLOPS peak Central disk resource 16.4 PB Geyser Caldera DAV clusters Pronghorn Phi testbed High Bandwidth Low Latency HPC and I/O Networks FDR InﬁniBand and 10Gb Ethernet NCAR HPSS Archive 160 PB capacity ~11 PB/yr growth 1Gb/10Gb Ethernet (40Gb+ future) Science Gateways RDA, ESG Data Transfer Services Remote Vis Partner Sites XSEDE Sites

6 User CommuniWes 1,134 HPC users in the last 12 months more than 450 dis5nct users each month 535 projects in the last 12 months more than 250 dis5nct projects each month NCAR staff (29%) Roughly equal use by CGD, MMM, ACD, HAO, RAL Smaller use by CISL, EOL, other programs University (29%) Larger number of smaller scale projects Many graduate students, post- docs Climate SimulaWon Laboratory (28%) Small number (<6) large- scale climate- focused projects Large por5on devoted to CESM community Wyoming researchers (13%) Smaller number of ac5vi5es from a broader set of science domains 6

7 Yellowstone usage reflects its mission to serve the atmospheric sciences Yellowstone use since start of producwon Climate, Large- Scale Dynamics 53% All Others 2% ComputaWonal Science 3% Ocean Sciences 9% Earth Sciences 3% Mesoscale Meteorology 3% Atmospheric Chemistry 4% Weather PredicWon 6% Geospace Sciences 9% Fluid Dynamics and Turbulence 8% 7

8 Applications used on Yellowstone Yellowstone Usage by ApplicaWon (excluding CESM) GHOST Zeus3D Chem WRF- DART WRF+ WRF WACCM NSU3D- FLOWYO MHD FWSDA CAM/IMPACT MURaM CESM- DART WRFDA MPAS CFD MHD- DART DART CAM PRS SWMF WRF- Chem HiPPSTR P3D GEOS5- ModelE NRCM NCAR LES Other HYCOM POP-SODA CAM-CARMA GCS model BATS-R-US LES ParFlow-CLM FVCOM CAM-ECHAM CAM-AGCM CLM WRF-Hydro Pencil MPAS-DA RegCM4-CLM MITgcm P3D+LFM CASINO 3D-EMPIC CM1 CESM-ROMS GFDL-FMS GFDL-CM1 SAM-LES 50+% of use from CESM (not shown on this chart) 52+ other apps/models idenwfied in 171 projects, represenwng 95% of resource use 8

9 Most jobs are small jobs; ~50% of core- hours are consumed by jobs > 64 nodes 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% % Jobs % Core- Hours 99% 99% of all jobs use 64 nodes or less regardless of core- hours consumed 49% 96% 96% of all jobs use 1,024 nodes or less, when weighted by core- hours consumed (i.e., 4% use > 1,024 nodes) job size (nodes) 9

10 Total CPU Hours Millions Recent producwon workload is dominated by 9-64 node jobs Monthly Yellowstone Resource Usage (CPU Hours) by Job Size >4kn 2kn- 4kn 1kn- 2kn n n n n 33-64n 17-32n 9-16n 5-8n 3-4n 2n 1n 0 Larger jobs during ASD period (early 2013) 10

11 Historical trends in job size (max, avg, weighted) show no dramawc shios Max nodes Avg node size Avg node size (weighted) 11

12 Daily Average Yellowstone FloaWng Point Efficiency is low relawve to peak theorewcal Daily Average % FloaWng Point Efficiency Yellowstone FloaWng Point Efficiency Yellowstone %FP Efficiency Yellowstone Avg TFLOP/s 5.0% 4.0% 3.0% 1.57% Lifetime Average Application Floating-Point Efficiency Daily Average TeraFLOP/second 2.0% % % 13/01/01 13/04/11 13/07/20 13/10/28 14/02/05 14/05/

13 On average, applicawons use about ~25% of Yellowstone s available memory Yellowstone has 32 GB of memory per node which is 2 GB/core Data collec5on for various periods of 5me Memory use collected from each node every 5 minutes, then averaged over 5me. 13

14 When looking at runwmes, most jobs consume 30 minutes or less Two- thirds or more of the very short jobs result from data assimila5on ac5vi5es using the DART framework, usually on 1-4 nodes. The remainder comprise model development and tes5ng and a small number of groups running large numbers of serial tasks. 14

15 When looking at core- hours consumed, distribuwon of runwmes is fairly uniform Wallclock limit for most Yellowstone queues is 12 hours. Prior NCAR system had wallclock limit of 6 hours. 15

GLADE: GLobally Accessible Data Environment GPFS NSD Servers 20 IBM x3650 M4 nodes; Intel Xeon E5-2670 processors w/avx 16 cores, 64 GB memory per node; 2.6 GHz clock 91.

16 GLADE: GLobally Accessible Data Environment GPFS NSD Servers 20 IBM x3650 M4 nodes; Intel Xeon E processors w/avx 16 cores, 64 GB memory per node; 2.6 GHz clock 91.8 GB/sec aggregate I/O bandwidth (4.8+ GB/s/server) I/O Aggregator Servers (export GPFS, GLADE- HPSS connecwvity) 4 IBM x3650 M4 nodes; Intel Xeon E processors w/avx 16 cores, 64 GB memory per node; 2.6 GHz clock 10 Gigabit Ethernet & FDR fabric interfaces High- Performance I/O interconnect to HPC & DAV Resources Mellanox FDR InfiniBand full fat- tree 13.6 GB/sec bidirec5onal bandwidth/node Disk Storage Subsystem 76 IBM DCS3700 controllers & expansion drawers each populated with 90 3 TB NL- SAS drives/controller PB usable capacity

17 GLADE Filesystems Snapshot (8/5/14) File System Intended use Capacity (PB) Used (PB) Sub- block/ Block size # Files (millions) # Directories (millions) /glade/u User program files; environment KB 512KB Projects Allocated project space; not purged KB 4MB Scratch Scratch space; purged (currently 90 day) KB 4MB

18 Project file system is dominated by a large number of small files Space used: 3 PB File count: 226 million /glade/p File System (project space) 4MB block, 128kB sub- block - August 2014 # of Files Millions B < 512 B < 128 KB < 4 MB < 100 MB < 1 GB < 10 GB < 100 GB # Files # NetCDF Files Total TB < 1 TB 1 TB+ NetCDF TB TeraBytes GPFS block size = 4MB GPFS sub-block size=128kb 18

19 Scratch file system exhibits the same usage pauern as Projects space Space used: 4 PB File count: 172 million /glade/scratch File System (scratch space) 4MB block, 128kB sub- block - August 2014 # of Files Millions B < 512 B < 128 KB < 4 MB < 100 MB < 1 GB < 10 GB < 100 GB # Files # NetCDF Files Total TB < 1 TB 1 TB+ NetCDF TB TeraBytes GPFS block size = 4MB GPFS sub-block size=128kb 19

20 /glade/u file system is used for home file system, applicawons & tools directories Space used: 10 TB /glade/u File System (home space) 512kB block, 16kB sub- block - August 2014 # of Files Millions B < 512 B < 16 KB < 512 KB < 100 MB File count: 37 million < 1 GB < 10 GB < 100 GB # Files # NetCDF Files Total GB < 1 TB 1TB+ NetCDF GB GigaBytes GPFS block size = 512kB GPFS sub-block size=16kb 20

DAV Resource UWlizaWon is Low Lifetime average

1% There has been a slight uptrend in utilization

While the DAV resources are, in part, meant to be

routinely running at high %utilization), they

21 DAV Resource UWlizaWon is Low Lifetime average utilization: Caldera 12.0% Geyser 13.1% There has been a slight uptrend in utilization of both DAV systems in recent months. While the DAV resources are, in part, meant to be used interactively (and thus should not be routinely running at high %utilization), they remain relatively underutilized particularly the caldera GPUaccelerated computational system. 21

22 Profile of a typical CESM run Between 3.54 GB per node (2 resolu5on) and 7 GB per node (¼ resolu5on) 15 cores, 2 threads per core (not all CESM models are threaded, however). Best Yellowstone configura5on for modest- sized runs (may not be true for all machines). The use of 16 cores appears to result in high OS noise (jiwer) that reduces performance below the 15 core configura5on. Ac5ve area of inves5ga5on. Largest cases may not use threading (affects on scalability being inves5gated) 22

23 I/O pauern of a typical CESM run shows lots of small files doing small block I/O Opens files (depending on configura5on) Has aggregate I/O performance of MB/s Spends 3%- 8% of run5me in I/O Most I/O opera5ons are very small (< 100 Byte) POSIX file opera5ons, but model output is wriwen as ~512 kb chunks Analysis of GLADE/GPFS performance shows no bowlenecks in metadata, disk, or network I/O traffic 23

NCAR Workload Analysis on Yellowstone. March 2015 V5.0

NCAR Workload Analysis on Yellowstone March 2015 V5.0 Purpose and Scope of the Analysis Understanding the NCAR application workload is a critical part of making efficient use of Yellowstone and in scoping