San Diego Supercomputer Center: Best practices, policies

San Diego Supercomputer Center: Best practices, policies Giri Chukkapalli supercomputer best practices symposium May 11, 05

Center s Mission Computational science vs computer science research Computational science Supporting Single code Supporting single field Supporting broad spectrum of fields Target existing users or grow new users Capacity vs capability computing Cant be everything to everybody Mission statement and policy document

User awareness Well publicizing to the target user community existing as well as upcoming compute, data capabilities of the center This will enable the user community to plan the type of problems they want to solve and develop appropriate codes to take advantage of the resources Otherwise, people who happened to know will make use of it

More than just a large supercomputer To support a broad computational science research community Peripheral hardware, software and personnel with wide range of expertise are necessary A sizable shared memory machine to do pre and post processing Large compute farm to run embarrassingly parallel jobs Viz. engines SAN

Computing: One Size Doesn t Fit All Data capability (Increasing I/O and storage) Data Storage/Preservation EOL CIPRes Campus, Departmental and Desktop Computing SDSC Data Science Env SCEC Visualization NVO Compute capability (increasing FLOPS) SCEC Simulation Extreme I/O ENZO Visualization CFD Protein Folding CPMD QCD Climate ENZO simulation Traditional HEC Env Can t be done on Grid (I/O exceeds WAN) 1. 3D + time simulation 2. Out-of-Core Distributed I/O Capable

Data Movement Into and out of the center SAN File system SAN to/from compute platform s parallel file system Movement of data between compute, viz. and pre/post processing engines Automatic migration of data to/from archive Bottleneck free data flow

Pushing the Data-Intensive Envelope 15 TF C O M P U T E R 100 TF Today s leading-edge 2 TB/s 10 TB/s 4 TB 60 TB 100 TB 10 PB Memory 1 GB/s Parallel File System Tomorrow s SAN DIEGO SUPERCOMPUTER demands CENTER 1 GB/s 100 MB/s Data Parking 100 GB/s 100 GB/s 10 GB/s Archival Tape System 10 TB 3 PB 10 PB 100 PB

Various file systems Small backed up /home file system Periodically purged fast parallel file system Parking file system SAN file system with auto-migration to archive Possibly non-backed non-purged intermediate size file system

Cyber Infrastructure Domain Specific Complex Systems Life Sciences Engineering Environmental Astrophysics Etc. Bioinformatics Automotive/ Climate/ Aircraft Weather Problem Solving Environments portals, UIs, web services Cyber Infrastructure Tools libraries Grid middleware bridge software, schedulers etc. GLOBUS LAYER Resource Specific Hardware Vector/ SMP Operating Systems, Compilers, Oracle TOMCAT A/D N E T W O R K / D A T A T R A N S P O R T L A Y E R MPPs Loosely coupled clusters Work stations Data Engines Web server Sensors instrum ents

SDSC DataStar 1.7 1.5 (7) (5) (171) 187 Total Nodes 11 p690 176 p655

SANergy Data Movement Federation Switch 2Gb Orion SANergy MDC 1Gb x 4 Teragrid Network 1Gb x 4 p690 SANergy client 2Gb x 4 SAN Switch Infrastructure Metadata operations, NFS Data operations SAM-QFS DISK

HPSS Force 10-12000 DataStar 11 P690s SANergy Client DataStar 176 P655s SANergy Server 5 x Brocade 12000 (1408 2Gb ports) Sun Fire 15K SAM-QFS SAN-GPFS ETF DB 32 FC Tape Drives ~400 Sun FC Disk Arrays (~4100 disks, 540 TB total)

Compute platform: setup Small identical Test system Perform all the upgrades on test system first Shared interactive pool Batch pool Setting up common environment Copydefaults Softenv Setting up of third party tools, libraries, helper apps, community codes

Compute platform: setup Providing example code, scripts, configures /usr/local/apps/examples Providing user interface to allocation management

Compute platform: Allocations Compute and data allocations Understanding space-time resolution relationships Peer (rotating body) review process Online system I am currently part of NSF review committee Can provide more info if needed

Criteria for machine access Preliminary access for porting, benchmarking and optimizing user s code Single CPU performance criteria (15%?) Scaling criteria (half the machine with 90%) If not met provide help, consulting

Compute platform: scheduling Higher priority to large PE jobs Allowing longer times to larger PE jobs Weighting based on allocation size Good API for users to probe and interact with the scheduler Prologue and epilogue scripts to bring the system to clean state Express, high, low and back fill queues Optimizing for maximum throughput vs quick turn around

Regression tests Well designed set of benchmarks and regression tests to monitor system correctness and performance Preventive maintenance Compiler/OS upgrades Provide access to login/interactive nodes during PM

Compute platform: life cycle Friendly user phase Few expert users who can cope with instabilities Production phase Criteria for a machine to be production Uptime Documentation Accounting stable Terminal phase When the next system goes to production 2 or 3 users who can use the whole machine

Communicating to Users User guide, FAQ Periodic articles on tools usage, example apps Yearly week long training Email, motd alerts

consulting Ticketing system, phone consulting Quick analysis and optimization help TOPs (targeted optimization and porting) program Extended collaboration Strategic Applications Collaboration (SAC) Modern tools like IM

Listening to users Periodic well designed surveys User advisory committee Local internal users Listening while consulting Application space is moving from monolithic single component analysis codes to multi-scale multi-physics systems simulation codes

Usage Analysis To see how we are fallowing the policies set

DS p655 Usage by node count (4/1/04-5/1/05) 128, 6% 129-176., 1% 1, 6% There have been recent increases in the # of 128-node jobs. 65-123., 4% 2-3., 4% 4, 6% 33-63., 8% 64, 10% 32, 9% 5-7., 2% 8, 15% 9-15., 5% 1 2-3. 4 5-7. 8 9-15. 16 17-31. 32 33-63. 64 65-123. 128 129-176. 16, 9% 17-31., 15%

SDSC User Snapshot: 2004 286 active projects 90 institutions 7 million SUs consumed on DataStar PIs funded by NSF, NIH, DOE, NASA, DOD, DARPA, AFOSR, ONR Time Awarded, by Discipline

PIs by Discipline

Time Awarded, by Discipline

Users Span the Nation States with SDSC-Allocated PIs

SDSC Compute Resources DataStar 1,628 Power4+ processors IBM p655 and p690 nodes 4 TB total memory Up to 2 GBps I/O to disk TeraGrid Cluster 512 Itanium2 IA-64 processors 1 TB total memory Intimidata 2,048 PowerPC processors 128 I/O nodes Half a petabyte of GPFS Intimidata Installation

SDSC Data Resources 1 PB Storage-area Network (SAN) 6 PB StorageTek tape library DB2, Oracle, MySQL Storage Resource Broker HPSS 72-CPU Sun Fire 15K 96-CPU IBM p690s

SDSC Top 10 Users (SUs consumed in 2004) Marvin Cohen, UC Berkeley DataStar: 846,397 SUs Michael Norman, UC San Diego DataStar: 551,969 Juri Toomre, U Colorado DataStar: 361,633 Richard Klein, UC Berkeley DataStar: 315,240 J Andrew Mccammon, UCSD DataStar: 310,909 Klaus Schulten, UIUC TeraGrid Cluster: 287,188 George Karniadakis, Brown U DataStar: 284,430 Richard Klein, UC Berkeley DataStar: 279,766 Pui-Kuen Yeung, Ga Tech DataStar: 220,172 Parviz Moin, Stanford U DataStar: 188,391

SAC: ENZO (Robert Harkness) Reconstructing the first billion years 3D cosmological hydrodynamics code Generates TBs of data now Stresses network and data movement limits Run anywhere, write data to SDSC with SRB

SAC: TeraShake (Yifeng Cui) Estimating the potential damage of a magnitude 7.7 Southern California earthquake Large-scale simulation of seismic wave propagation on the San Andreas Fault 1.8 billion gridpoints 240 DataStar processors 1 TB memory 5 days 2 GB/s continuous I/O 47 TB output

NVO Montage (Leesa Brieger) Compute-intensive service to deliver science-grade custom mosaics on demand, with requests made through existing portals 2MASS: 10-TB, three-band infrared frequency archive of the entire sky Compute-intensive generation of custom mosaics Possible to mosaic the whole sky into five-degree squares with ~1 week of TeraGrid time

Bluegene specific better development environment eliminate cross compilation need(pretty ancient) Run BGL kernel as a VM on the front end? BGL s special need for packing jobs on contiguous chunk of nodes Special map files, mapping codes

Bluegene: : experience Extremely reproducible times Extremely stable hardware Very poor single processor (compiler?) performance (double hummer, simd) Still not tested computation/communication overlap Would like to operate in single-boot, multi-user mode

Bluegene: : experience Several SDSC codes ported: Mpcugles: LES turbulence code PK s DNS turbulence code POP ocean model SPECFEM3D: seismic wave propagation Amber: MD chemistry code ENZO: Astrophysics code NAMD, CPMD came from IBM

Bluegene: : latest Half a petabyte of SATA file system attached to BGL 64 IA64 server nodes 3.2GB/s reads and 2.8GB/s writing 700MB/s from a production code using 512 nodes